Tsinghua AIFeb 16, 2026arXiv:2602.14393

Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators

Zongle Huang, Hongyang Jia, Kaiwei Zou, Yongpan Liu

AI Summary

The paper introduces Scope, a merged pipeline framework for multi-chip-module (MCM) neural network accelerators that jointly considers multiple layers to improve throughput and scalability by optimizing computation, communication, and memory costs. To manage the increased design space complexity, the authors developed search algorithms that reduce complexity from exponential to linear while identifying high-performing solutions. Experimental results on ResNet-152 demonstrate that Scope achieves up to 1.73x throughput improvement compared to state-of-the-art approaches, with similar energy consumption.

Key Contribution

Unlock 1.7x throughput gains on multi-chip neural network accelerators by jointly optimizing the pipelining of multiple layers, a dimension previously overlooked.

Abstract

Neural network (NN) accelerators with multi-chip-module (MCM) architectures enable integration of massive computation capability; however, they face challenges of computing resource underutilization and off-chip communication overheads. Traditional parallelization schemes for NN inference on MCM architectures, such as intra-layer parallelism and inter-layer pipelining, show incompetency in breaking through both challenges, limiting the scalability of MCM architectures. We observed that existing works typically deploy layers separately rather than considering them jointly. This underexploited dimension leads to compromises between system computation and communication, thus hindering optimal utilization, especially as hardware/software scale. To address this limitation, we propose Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs. This new dimension, however, adds to the complexity of design space exploration (DSE). To tackle this, we develop a series of search algorithms that achieves exponential-to-linear complexity reduction, while identifying solutions that rank in the top 0.05% of performance. Experiments show that Scope achieves up to 1.73x throughput improvement while maintaining similar energy consumption for ResNet-152 inference compared to state-of-the-art approaches.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators

Related Papers