ETHBolognaJun 1, 2026arXiv:2606.02358

CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

Lorenzo Leone, Philip Wiese, Gamze İslamoğlu, Michael Rogenmoser, Davide Rossi, Francesco Conti, Luca Benini

AI Summary

Chimera is an advanced Microcontroller Unit (MCU) designed for ultra-low-power edge applications, integrating a transformer accelerator with nine RV32IMA cores to enhance real-time inference of transformer-based models. Built using 22 nm FDX technology, it features a unique L2 memory subsystem that supports high bandwidth data sharing while ensuring quality-of-service for latency-sensitive tasks, achieving up to 16x latency reduction. The chip demonstrates impressive performance metrics, achieving 3.1 TOPS/W energy efficiency and 281 GOPS/mm² area efficiency, significantly outperforming existing state-of-the-art systems in both energy and area efficiency.

Key Contribution

Achieving 3.1 TOPS/W energy efficiency, Chimera sets a new benchmark for ultra-low-power AI inference at the edge.

Abstract

We present Chimera, a flexible and scalable Microcontroller Unit (MCU) designed to accelerate real-time inference of rapidly evolving transformer-based models at the ultra-low-power edge (hundred of mW). The chip, implemented in 22 nm FDX technology, integrates a transformer accelerator tightly coupled within a compute cluster featuring nine general-purpose RV32IMA cores. Scalability extends to the memory hierarchy through a novel L2 memory island subsystem, which enables data sharing across multiple clusters while delivering 563 Gb/s aggregate bandwidth. The L2 subsystem enforces quality-of-service guarantees for latency-critical traffic, achieving up to 16x latency reduction. Chimera achieves peak energy and area efficiencies of 3.1 TOPS/W and 281 GOPS/mm2, demonstrating 1.37x higher energy efficiency and up to 100x higher area efficiency compared to State of the Art (SoA) SoCs. Compared to SoA standalone accelerators, Chimera achieves comparable energy efficiency and up to 1.8x higher area efficiency.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

Related Papers