Apr 14, 2026arXiv:2604.12618

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

Weichuang Zhang, Weichuan Zhang, Yiquan Wang, Xinzhou Zhang, Xinzhou Zhang, Chi Zhang, Xiaofeng Hou, Chao Li, Jieru Zhao, Jie Zhao, Minyi Guo

AI Summary

CODO is introduced as an automated compiler designed to generate efficient dataflow accelerators on FPGAs, addressing the challenge of manual implementation for large-scale applications. It systematically detects and eliminates dataflow violations at both coarse-grained and fine-grained levels, and optimizes both on- and off-chip data movement. Experiments demonstrate that CODO achieves significant latency speedups compared to state-of-the-art frameworks on computation kernels (1.45x to 4.52x), DNN models (3.7x to 33.8x), CNN models (7.3x), and the GPT-2 model (2.07x).

Key Contribution

Forget hand-tuning: CODO automatically compiles efficient FPGA dataflow accelerators, delivering up to 33.8x speedups on DNN models compared to existing frameworks.

Abstract

FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient dataflow architecture for large-scale applications is still challenging, even for specialists who use high-level synthesis (HLS) to simplify FPGA programming. To address this, we introduce CODO, an automated compiler that generates feasible and efficient dataflow accelerators on FPGAs. CODO features a systematic method for detecting and eliminating both coarse-grained and fine-grained dataflow violations. Building on this, CODO performs both on- and off-chip data movement optimizations to maximize transfer efficiency. To guarantee a higher design quality, CODO performs automatic scheduling to generate high-performance dataflow accelerators, ensuring a balanced performance-resource trade-off. Synthesis results show that CODO delivers $1.45\times$ to $4.52\times$ latency speedups on typical computation kernels and $3.7\times$ to $33.8\times$ speedups on DNN models compared to SOTA frameworks. In on-board evaluations, CODO achieves $7.3\times$ average speedup on CNN models and $2.07\times$ average speedup on the GPT-2 model over SOTA frameworks. The compiler is open-sourced at https://github.com/sjtu-zhao-lab/codo-artifact.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

Related Papers