Author - Adrian Sossna

3 Jul

BLOG: CGRA vs. CGLA - Why CGLA is the Better Architecture

In recent years, the evolution of AI has shown no signs of slowing down. Supporting this AI revolution are specialized "AI chips" designed specifically for AI computations. Lately, a growing number of startups developing new AI chips have adopted an architecture called "CGRA."

However, an architecture that solves the major challenges of CGRA and takes things a step further—"CGLA"—remains largely unknown.

This article will explain the difference between CGRA and CGLA for beginners in the semiconductor field, and introduce the advantages and features that give CGLA the potential to dominate the AI era.

What is a CGRA?

Let's start with the basics. What is a CGRA?

In a nutshell, a CGRA (Coarse-Grained Reconfigurable Array) is like a "computer made of LEGO blocks that can be reshaped to fit the program."

If a general-purpose CPU is a "universal factory that can handle any calculation," then a CGRA is like a "specialized factory whose production lines (computing circuits) can be freely reconfigured to make a specific product (perform a specific computation)."

By arranging numerous small processing units and changing their connections according to the program, a CGRA can execute specific tasks like AI image recognition or data analysis overwhelmingly faster and with less power than a CPU.

This is why many AI chip startups are focusing on CGRA. However, this "reconfiguring the production line" process has a major weakness.

The "Major Weakness" of CGRA

The weakness of CGRA is its "compilation time."

Compilation is the process of converting a program written by a human into a blueprint that the hardware can understand. For a CGRA, this is equivalent to figuring out the optimal layout of the production line to achieve maximum efficiency.

Finding the best placement and connection scheme for countless units is like solving an incredibly complex puzzle. As a result, it's not uncommon for CGRA compilation to take several hours, sometimes even more than a full day.

This means that every time you want to try a new AI model or make a small change to a program, it becomes a day-long affair. This is a critical flaw in today's world, where development speed is everything.

Enter the Savior: CGLA

This is where CGLA (Coarse-Grained Linear Array) comes in. As its name suggests, it's an architecture where processing units are connected in a linear—or ring-like—fashion, packed with innovative ideas that solve CGRA's challenges. The "IMAX" from NAIST, mentioned frequently in our chat, is a prime example of a CGLA.

CGLA's Advantage #1: Compilation Time in "Seconds"

Why is CGLA so fast to compile? The secret lies in its namesake "linear (ring-like)" structure.

  • CGRA: Units are arranged in a complex 2D grid, making it very difficult to find the optimal routes.
  • CGLA: Units are connected in a simple ring (linear) structure, which eliminates the need for the compiler to search for complex routes, making the program "mapping" process extremely easy.

This dramatically reduces compilation time from hours to just a few seconds. For developers, this is a revolutionary change.

CGLA's Advantage #2: The Pipeline "Never" Stalls

The biggest drain on a computer's performance is the "pipeline stall," which occurs when the process has to wait for data.

The CGLA (like IMAX) uses the dataflow principle and multithreading technology within its units to eliminate stalls by design.

This is like a factory production line where parts always arrive at the worker's station at the exact moment they are needed. Because the hardware continues to operate without any waste, it maintains extremely high computational efficiency.

CGLA's Advantage #3: Thoroughly Energy-Efficient Design

Moving data is a major source of power consumption. The CGLA (like IMAX) places a large cache memory right next to each processing unit.

This is like each worker on the assembly line having their own personal, large toolbox and parts bin. Since there's no need to constantly walk to a central warehouse for parts, data movement is minimized, leading to significant energy savings.

Summary: CGRA vs. CGLA

Feature

Typical CGRA

CGLA (e.g., IMAX)

Unit Connection

Complex 2D Mesh Structure

Simple Linear (Ring) Structure

Compilation Time

Hours to 1+ Day

Seconds

Pipeline Efficiency

Prone to Stalls

Stall-less

Memory Structure

Frequent access to shared memory

Local cache per unit, minimal data movement

Development Efficiency

Low

Extremely High

As you can see, CGLA is truly a next-generation architecture that refines the CGRA concept and solves its practical challenges.

While startups using CGRA are making waves in the AI chip market today, more advanced technologies like CGLA are steadily moving towards practical application. The day that CGLA—with its combined development efficiency, execution efficiency, and energy savings—becomes the standard for future AI chip development may not be far off. 


19 Jun

BLOG: The 'Reconfigurable' Chip that Dominates Both AI and Crypto - The True Value of CGLA

Today's digital world is powered by specialized chips dedicated to specific purposes. AI chips for AI, GPUs for graphics processing, and mining-specific ASICs (Application-Specific Integrated Circuits) for cryptocurrency... It has been common sense to prepare dedicated hardware to achieve the best performance for each task. But what if a single chip could change its form like a chameleon, becoming both an AI accelerator and a cryptocurrency mining machine? Making such a dream concept a reality is the CGLA (Coarse-Grained Linear Array), which we introduced in a previous article. This time, we will focus on CGLA's most powerful weapon, "Reconfigurability," and explore its true value.

What is "Reconfigurability"? The "Transformation Ability" of the Digital World

To understand "reconfigurability," imagine an orchestra.
  • Dedicated Chip (ASIC): This is a "string quartet" assembled solely to perform one specific piece of music perfectly. They are the best in the world at that one piece, but they cannot play any other music at all.
  • CPU: This is a small "jazz band" that can handle any tune reasonably well. It's versatile, but lacks the power to perform a grand symphony.
  • CGLA: This is a "full orchestra" of musicians who can play any instrument according to the conductor's (program's) instructions. It can reconfigure the optimal instrument setup (circuit configuration) and sheet music (dataflow) to match the concert's program (computation task).
Thus, CGLA's reconfigurability is the ability to reconfigure the hardware circuit itself into the most logically efficient form according to the program being executed.

CGLA IN ACTION ①: As an AI Accelerator

The computations in AI, especially in Large Language Models (LLMs), are not just simple repetitions of addition and multiplication. They are complex combinations of various calculations like "matrix multiplication" and "convolutional operations." CGLA (like IMAX) optimizes its internal data paths to match these AI computations.
  • When a matrix multiplication begins, just as an orchestra would create a magnificent harmony centered on strings and wind instruments, the CGLA forms an optimal pipeline for numerous processing units to collaborate on the matrix calculation.
  • By designing a seamless flow of data, it executes AI processes with extremely high power efficiency, without any pipeline stalls.
This is why CGLA demonstrates superior performance as a state-of-the-art AI chip. But what about in the completely different world of cryptocurrency mining?

CGLA IN ACTION ②: As a Crypto Mining Machine

The computation required for cryptocurrency mining is very simple: repeat a hash calculation like "SHA-256" over and over again, an incredible number of times. For this reason, mining-specific ASICs pack as many circuits as possible onto a chip dedicated solely to this calculation. At first glance, this seems like a completely different universe from AI computation. However, this is where the "reconfigurability" of CGLA truly shines. This architecture can completely transform its internal structure to suit the task of mining.
  • It dismantles the complex pipelines used for AI calculations and reconfigures its numerous processing units to act as independent, parallel hash calculators.
  • This is like the orchestra transforming into a percussion ensemble, where every musician beats the same drum relentlessly.
  • IMAX has a special operating mode called "REFILL mode," which is a mechanism designed to optimize the data supply for simple, high-throughput calculations just like mining.
In other words, the same hardware that was running the latest AI model yesterday can be transformed into a high-performance mining machine today with a single compile (reconfiguration). This aligns with the design philosophy of the "LiCryptor" crypto-accelerator presented in the paper by Hoai Luan Pham and his colleagues.

The True Advantage of CGLA: Adaptability to the Future

The greatest benefit of this "reconfigurability" is adaptability to the future.
  • Future-Proof: Both AI algorithms and cryptocurrency algorithms evolve daily. When an algorithm changes, a dedicated ASIC becomes a worthless paperweight. But with CGLA, you can simply recompile the program to be reborn as hardware optimized for the new algorithm.
  • Economic Efficiency: A single chip design can target two completely different markets: AI and cryptocurrency. A data center could use CGLA for corporate AI development during the day, and then reallocate those resources for mining or scientific computing at night.

Conclusion

CGLA, an evolution of the CGRA concept, overturns the semiconductor industry's paradigm of "one task, one dedicated chip" through its reconfigurability. The flexibility to target two huge markets—AI and crypto—with the same hardware, and the adaptability to handle future changes, is why CGLA is the most promising candidate for the next generation of computing infrastructure. In the digital society of the future, survival may not belong to the strongest chip, but to the one that is most adaptable to change. CGLA is an architecture designed for exactly that change.