Google ResearchHKUSTIndependent ResearcherMcGillUofTApr 5, 2026arXiv:2604.04236

NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

Shangkun Li, Jinming Ge, Diyuan Tao, Zeyu Li, Jiawei Liang, Linfeng Du, Jiang Xu, Cheng Tan

AI Summary

This paper introduces NEURA, a compilation framework that addresses the control-dataflow mismatch in CGRAs by representing control flow as predicates within a pure dataflow intermediate representation. This allows NEURA to flatten complex control flow into a unified dataflow graph, decoupling kernel representation from specific hardware architectures. Experiments show NEURA achieves significant speedups (up to 2.71x) on a high-performance CGRA and competitive performance on a low-power CGRA compared to state-of-the-art baselines.

Key Contribution

CGRA performance jumps by 2.7x thanks to NEURA, a compilation framework that elegantly transforms control flow into dataflow.

Abstract

Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their full potential is severely hindered by control flow in accelerated kernels, as the control flow (e.g., loops, branches) is fundamentally incompatible with the parallel, data-driven CGRA fabric. Prior strategies to resolve this mismatch in CGRA kernel acceleration are either inefficient, sacrificing performance for generality, or lack generality due to the difficulty of adapting them across different execution models. Thus, a general and unified solution for efficient CGRA kernel acceleration remains elusive. This paper introduces NEURA, a unified and retargetable compilation framework that systematically resolves the control-dataflow mismatch in CGRAs. NEURA's core innovation is a novel, pure dataflow intermediate representation (IR) built on a predicated type system. In this IR, control contexts are embedded as a predicate within each data, making control an intrinsic property of data. This mechanism enables NEURA to systematically flatten complex control flow into a single unified dataflow graph. This unified representation decouples kernel representation from hardware, empowering NEURA to retarget diverse CGRAs with different execution models and microarchitectural features. When targeted to a high-performance spatio-temporal CGRA, NEURA delivers a 2.20x speedup on kernel benchmarks and up to 2.71x geometric mean speedup on real-world applications over state-of-the-art (SOTA) high-performance baselines. It also provides a competitive solution against the SOTA low-power CGRA when retargeted to a spatial-only CGRA. NEURA is open-source and available at https://github.com/coredac/neura.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

Related Papers