SUSTAINABLE AI & CRYPTO MINING HARDWARE
DESIGNED IN JAPAN
Lenzo builds ultra-efficient hardware designed for modern parallel workloads — from blockchain to AI inference. Our proprietary CGLA architecture delivers breakthrough performance per watt in a compact, cost-effective form.

COMPUTE ENGINES FOR THE AI & CRYPTO FRONTIER
KEY BENEFITS

ABOUT
Crafted in Japan, Lenzo is built by the engineers behind some of the world’s fastest chips—from Sony’s PlayStation CPU/GPU teams to the designers of Fujitsu’s supercomputers. Our founding team includes researchers from NAIST and ITRI Taiwan, with deep expertise in large-scale compute infrastructure. We are system architects and builders, committed to engineering high-performance, energy-efficient hardware for a new generation of workloads.
TECHNICAL RESOURCES
LENZO IN THE MEDIA: Featured on XenoSpectrum
On August 27th, Y Kobayashi wrote an article on XenoSpectrum about Lenzo, the original article is available only in Japanese, and can be found here. Below follows a translation to English:
Engineers who worked on the development of the PlayStation 2's Emotion Engine and the PlayStation 3's Cell Broadband Engine are seeking to revolutionize the world of computing. The startup they founded, Lenzo , is developing a new architecture called CGLA , which looks ahead to the age of AI and cryptocurrencies. This architecture has already demonstrated power efficiency that surpasses NVIDIA GPUs in cryptocurrency mining.
Could this signal a breakthrough in the stagnation of semiconductor design that has lasted for 30 years? This article delves into the core of this technology and its vision.
Built with Expertise: Lenzo envisions the next generation of computing
Japan's semiconductor industry has long since lost its former glory. However, Lenzo, a certified startup from the Nara Institute of Science and Technology (NAIST), has stepped forward with technology that has the potential to overturn conventional wisdom in the industry.
CEO Kenshin Fujiwara previously worked at Sony Computer Entertainment (now Sony Interactive Entertainment) developing the CPU/GPU for PlayStation 2 and 3. Co-founder and chief architect Professor Yasuhiko Nakajima is a true authority on computer science, having been involved in the development of supercomputer processors at Fujitsu.

What they are proposing to the world is a completely new semiconductor architecture called CGLA (Coarse-Grained Linear Array) .
What is "CGLA"? A brief introduction to a new approach
To understand the innovation of CGLA , we must first look at the fundamental challenges facing modern computers.
Why is a new architecture needed now?
Most of the processors (CPU/GPU) in the PCs and smartphones we use every day are based on a basic design known as the von Neumann type, which separates the "arithmetic unit" that performs calculations from the "memory" that stores data.
While this structure is highly versatile, it suffers from a few weaknesses. Every time an operation is performed, a huge amount of data must be transferred between the processing unit and memory, and this transfer itself consumes a large portion of the power, creating a bottleneck in processing speed. This basic design has remained virtually unchanged for the past 30 years. Improvements in processor performance have been achieved primarily through the miniaturization of semiconductors, but this is also approaching its physical limits. In today's world, where the amount of data handled is exploding, as seen in AI and blockchain, this "von Neumann bottleneck" is a major hindrance to technological evolution.
Beyond the limits of CGRA and Systolic Array
To overcome this barrier, various architectures have been proposed, most notably the "Coarse-Grained Reconfigurable Array (CGRA)" and the "Systolic Array ".
- CGRA: Its distinguishing feature is its flexibility, allowing its circuit configuration to be dynamically rewritten. However, this flexibility also comes at the cost of a fatal drawback: it takes several hours to several days to reconfigure (compile) the circuit.
- Systolic Array: Used in Google's TPUs and other applications, it demonstrates high performance by specializing in AI matrix operations. However, because it is too specialized for a specific process, it lacks flexibility and is unsuitable for non-AI calculations such as blockchain.
Lenzo's CGLA was developed to solve these problems. They have succeeded in combining the reconfiguration flexibility of CGRA, the high speed of Systolic Array, and the concept of near memory computing, which minimizes data transfer to and from memory, into a single architecture.
The heart of CGLA: three innovative ideas
The superiority of CGLA is theoretically supported in an academic paper published by the Nakajima Laboratory at NAIST entitled "CGLA: Coarse-Grained Linear Array for Multi-Hash Acceleration in Blockchain Mining ". At its core are the following three ideas:
- Self-updating data method: Blockchain mining involves a continuous brute force search for a value called a "nonce." In conventional methods, the computation unit must query the host PC for the next nonce value each time it attempts to find it, which creates a bottleneck in communication. CGLA implements a function that allows the computation unit (PE: Processing Element) to automatically update the nonce value itself. This dramatically reduces communication with the host PC, allowing the computation unit to focus on calculations.
- Scalable linear PE array: The extremely simple structure of arranging the PEs, which are the processing units, in a one-dimensional linear fashion is adopted. This not only simplifies design but also provides high scalability, making it easy to scale performance by increasing the number of PEs as needed.
- Dedicated ALU design: The arithmetic unit (ALU) built into each PE is optimized for hash function calculations. It is designed to be able to perform multiple calculations in one clock cycle, boasting extremely high calculation efficiency.
Data reveals the power of CGLA
CGLA isn't just a theoretical concept - Lenzo has already run a prototype on the TySOM-3A FPGA (a semiconductor whose circuit configuration can be reprogrammed after manufacturing) board, and has provided data to prove its capabilities.
High Hash rate that leaves the competition behind
In terms of hash rate, a performance indicator for cryptocurrency mining, CGLA surpasses the market leaders:
Product name | Process Node | TH/s (Terahashes per second) |
---|---|---|
Bitmain Antminer S21 Pro | 5nm | 234 |
Lenzo Core | 16nm | 196.6 |
Lenzo Core | 7nm | 619.29 |
(Source: Lenzo) |
It is noteworthy that while Bitmain's latest mining machine, the Antminer S21 Pro, achieves 234 TH/s using the cutting-edge 5nm process, Lenzo Core achieves 619.29 TH/s , approximately 2.6 times that figure, using the previous generation 7nm process.
Even the older 16nm process shows performance approaching that of the latest 5nm machine. This demonstrates the overwhelming superiority of the architecture itself, rather than the superiority of the process technology.
Up to 8.7x faster than GPUs and up to 44.5x faster than conventional CGRAs
Even more impressive than its performance is its energy efficiency. According to the aforementioned research paper, in an ASIC simulation using a 45nm process, CGLA outperformed existing technologies by an order of magnitude.
- Relative to GPUs: 2.8x to 8.7x more energy efficient than NVIDIA's high-end GPUs .
- Compared to conventional CGRAs: Compared to previously studied CGRA architectures, we have achieved up to 17.8 times the throughput and up to 44.5 times the energy efficiency.
This is a crucial competitive advantage in the mining industry, where the cost of electricity for computation directly impacts profitability: Lenzo's architecture can perform more computations with less power.
Crypto assets are just the beginning
Despite CGLA's impressive performance, Lenzo isn't just aiming to dominate the mining market. Their ultimate goal is to disrupt the AI computing market.
Why "Crypto First"?
The biggest barrier to market entry for new semiconductor architectures is the lack of a software ecosystem. In particular, in the AI field, the development environment "CUDA" that NVIDIA has built over many years has become the de facto standard, and new entrants have been unable to break this stronghold.
So Lenzo has come up with a very clever two-step strategy.
- Entering a CUDA-independent market: Cryptocurrency mining does not rely on specific software frameworks like CUDA, and revenue is directly linked to the computational performance of the hardware. CGLA's overwhelming performance and efficiency make it ideal for establishing an early market.
- Exploiting the supply-demand imbalance: With the rise in cryptocurrency prices, the demand for mining machines is constantly outstripping supply. This market gap presents a huge opportunity for new players like Lenzo.
- Securing self-funding (self-mining): By conducting mining using high-performance mining machines developed in-house, Lenzo can secure a stable source of revenue before we start selling products. This will be reinvested in the development of AI chips and the construction of a software ecosystem, which is the biggest challenge.
The strategy is to avoid a direct confrontation with NVIDIA, first establish a foothold and secure funds in a market where it is easier to compete, and then, once ready, take on the AI market, which is Lenzo's main target.
Groundwork for AI servers: LLM is already underway
Lenzo's efforts towards AI is not just a concept. The company has already built a prototype AI server equipped with 64 Lenzo cores in an environment that combines Intel servers and FPGAs. They have successfully run Meta's large-scale language model (LLM), Llama, on this server.
In a publicly released demo video, an FPGA equipped with a Lenzo Core running at just 140 MHz demonstrated inference performance approaching one-third of an Intel Xeon processor (3.1 GHz), despite operating at only 1/22nd of the clock speed. It’s a clear sign that CGLA has the potential to become a game-changer in AI inference processing as well.
(Source: Lenzo)
Can Lenzo rekindle Japan’s semiconductor ambitions?
Top engineers who once helped put Japan at the forefront of global technology — from PlayStation to supercomputers — are coming together again to tackle today’s computing challenges from the architecture level.
Lenzo’s story feels to the author like more than just a startup journey; it carries a broader significance.
According to CEO Mr. Fujiwara, he has long felt concerned that, since the PlayStation 4 era, the industry has moved toward off-the-shelf architectures, losing momentum around original chip architecture development.
“In the past 30 years, we’ve seen very little true innovation in chip architecture. With our unique CGLA architecture, which can support both AI and crypto workloads, we’ve achieved something only we could.”
His words reflect a strong desire to help Japan return to the forefront of technological innovation after decades of stagnation. Of course, the road ahead won’t be easy. One of the biggest hurdles will likely be whether Lenzo can build a software development environment comparable to CUDA. No matter how capable the hardware, it won’t matter unless developers can make use of it. A key question will be how effectively the company can reinvest its mining revenue into software development. This is where Lenzo’s real capabilities will be tested.
Still, the value of the challenge itself is significant. As noted by renowned VC firm Andreessen Horowitz (a16z), the convergence of AI and crypto represents a major force driving the next wave of computing. Lenzo appears to have identified this intersection earlier — and perhaps more clearly — than most.
Whether they can meaningfully challenge an industry giant like NVIDIA and spark renewed momentum in Japan’s semiconductor sector remains to be seen. But their efforts are certainly worth watching.
Paper
- IEEE Xplore: CGLA: Coarse-Grained Linear Array for Multi-Hash Acceleration in Blockchain Mining
DOI: 10.1109/ISOCC62682.2024.10762376
References
If you found this article interesting, please share it!
TECHNICAL BRIEF: CGLA: Coarse-Grained Linear Array for Multi-Hash Acceleration in Blockchain Mining
Abstract: In emerging blockchain-based IoT systems, highly flexible and energy-efficient hash function hardware design is necessary to maintain the operation of diverse blockchain networks. Accordingly, a coarse-grained reconfigurable array (CGRA) is the most optimal architecture for implementing hash functions; however, current CGRA-based works still have slow speeds and low energy efficiency.
To solve these problems, this paper proposes a Coarse-Grained Linear Array (CGLA), upgrading from the CGRA, to perform multiple hash functions with high speed and energy efficiency. To achieve that goal, three main ideas are proposed: a self-updating data method, an expandable processing element array (PEA), and an efficient arithmetic logic unit (ALU) dedicated to hash functions. Our CGLA has been successfully implemented on a TySOM-3A FPGA. Evaluations on 45nm ASICs show that the CGLA is 2.8-8.7 times more power efficient than GPUs, and 1.3-17.8 times and 1.9-44.5 times better in throughput and energy efficiency than previous CGRAs.
Contact hello@lenzo.co.jp to request your full copy.
TECHNICAL BRIEF: Bonanza Mine an Ultra Low Voltage Energy Efficient Bitcoin Mining ASIC
Abstract: Bitcoin is the leading blockchain-based cryptocurrency used to facilitate peer-to-peer transactions without relying on a centralized clearing house [1]. The conjoined process of transaction validation and currency minting, known as mining, employs the compute-intensive SHA256 double hash as proof-of-work. The one-way property of SHA256 necessitates a brute-force search by sweeping a 32b random input value called nonce. The 232 nonce space search results in energy-intensive pool operations distributed on high-throughput mining systems, executing parallel nonce searches with candidate Merkle roots.
Energy-efficient custom ASICs are required for cost-effective mining, where energy costs dominate operational expenses, and the number of hash engines integrated on a single die govern platform cost and peak mining throughput [2]. In this paper, we present BonanzaMine, an energy-efficient mining ASIC fabricated in 7nm CMOS (Fig. 21.3.7), featuring: (i) bitcoin-optimized look-ahead message digest datapath resulting in 33% Cdyn reduction compared to conventional SHA256 digest datapath; (ii) a half-frequency scheduler datapath, reducing sequential and clock power by 33%; (iii) 3-phase latch-based design with stretchable non-overlapping clocks, eliminating min-delay paths; (iv) robust ultra-low-voltage operation at 355mV using board-level voltage-stacking; and (v) mining throughput of 137GHash/s at an energy efficiency of 55J/THash.
Contact hello@lenzo.co.jp to request your full copy.
BLOG: CGRA vs. CGLA - Why CGLA is the Better Architecture
In recent years, the evolution of AI has shown no signs of slowing down. Supporting this AI revolution are specialized "AI chips" designed specifically for AI computations. Lately, a growing number of startups developing new AI chips have adopted an architecture called "CGRA."
However, an architecture that solves the major challenges of CGRA and takes things a step further—"CGLA"—remains largely unknown.
This article will explain the difference between CGRA and CGLA for beginners in the semiconductor field, and introduce the advantages and features that give CGLA the potential to dominate the AI era.
What is a CGRA?
Let's start with the basics. What is a CGRA?
In a nutshell, a CGRA (Coarse-Grained Reconfigurable Array) is like a "computer made of LEGO blocks that can be reshaped to fit the program."
If a general-purpose CPU is a "universal factory that can handle any calculation," then a CGRA is like a "specialized factory whose production lines (computing circuits) can be freely reconfigured to make a specific product (perform a specific computation)."
By arranging numerous small processing units and changing their connections according to the program, a CGRA can execute specific tasks like AI image recognition or data analysis overwhelmingly faster and with less power than a CPU.
This is why many AI chip startups are focusing on CGRA. However, this "reconfiguring the production line" process has a major weakness.
The "Major Weakness" of CGRA
The weakness of CGRA is its "compilation time."
Compilation is the process of converting a program written by a human into a blueprint that the hardware can understand. For a CGRA, this is equivalent to figuring out the optimal layout of the production line to achieve maximum efficiency.
Finding the best placement and connection scheme for countless units is like solving an incredibly complex puzzle. As a result, it's not uncommon for CGRA compilation to take several hours, sometimes even more than a full day.
This means that every time you want to try a new AI model or make a small change to a program, it becomes a day-long affair. This is a critical flaw in today's world, where development speed is everything.
Enter the Savior: CGLA
This is where CGLA (Coarse-Grained Linear Array) comes in. As its name suggests, it's an architecture where processing units are connected in a linear—or ring-like—fashion, packed with innovative ideas that solve CGRA's challenges. The "IMAX" from NAIST, mentioned frequently in our chat, is a prime example of a CGLA.
CGLA's Advantage #1: Compilation Time in "Seconds"
Why is CGLA so fast to compile? The secret lies in its namesake "linear (ring-like)" structure.
- CGRA: Units are arranged in a complex 2D grid, making it very difficult to find the optimal routes.
- CGLA: Units are connected in a simple ring (linear) structure, which eliminates the need for the compiler to search for complex routes, making the program "mapping" process extremely easy.
This dramatically reduces compilation time from hours to just a few seconds. For developers, this is a revolutionary change.
CGLA's Advantage #2: The Pipeline "Never" Stalls
The biggest drain on a computer's performance is the "pipeline stall," which occurs when the process has to wait for data.
The CGLA (like IMAX) uses the dataflow principle and multithreading technology within its units to eliminate stalls by design.
This is like a factory production line where parts always arrive at the worker's station at the exact moment they are needed. Because the hardware continues to operate without any waste, it maintains extremely high computational efficiency.
CGLA's Advantage #3: Thoroughly Energy-Efficient Design
Moving data is a major source of power consumption. The CGLA (like IMAX) places a large cache memory right next to each processing unit.
This is like each worker on the assembly line having their own personal, large toolbox and parts bin. Since there's no need to constantly walk to a central warehouse for parts, data movement is minimized, leading to significant energy savings.
Summary: CGRA vs. CGLA
Feature |
Typical CGRA |
CGLA (e.g., IMAX) |
---|---|---|
Unit Connection |
Complex 2D Mesh Structure |
Simple Linear (Ring) Structure |
Compilation Time |
Hours to 1+ Day |
Seconds |
Pipeline Efficiency |
Prone to Stalls |
Stall-less |
Memory Structure |
Frequent access to shared memory |
Local cache per unit, minimal data movement |
Development Efficiency |
Low |
Extremely High |
As you can see, CGLA is truly a next-generation architecture that refines the CGRA concept and solves its practical challenges.
While startups using CGRA are making waves in the AI chip market today, more advanced technologies like CGLA are steadily moving towards practical application. The day that CGLA—with its combined development efficiency, execution efficiency, and energy savings—becomes the standard for future AI chip development may not be far off.
Ready to power the next era of intelligent infrastructure?
