DAMOHKUHKUSTINFIFORCEPKUTongjiJun 3, 2026arXiv:2606.05158

Streaming Communication in Multi-Agent Reasoning

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen, Xander Xu, Ying-Cong Chen

AI Summary

This paper introduces StreamMA, a novel multi-agent reasoning system that employs a streaming approach to communicate reasoning steps between agents, significantly reducing latency compared to traditional generate-then-transfer methods. The study reveals that by leveraging early, more reliable reasoning steps, StreamMA not only enhances the speed of processing but also improves overall effectiveness, as later steps are often error-prone. Through a comprehensive analysis across eight reasoning benchmarks, the authors demonstrate that StreamMA achieves an average performance improvement of 7.3 percentage points over existing baselines, highlighting the importance of a new "step-level scaling law" that optimizes both effectiveness and efficiency in multi-agent systems.

Key Contribution

Streaming reasoning steps can boost multi-agent system performance by 7.3 percentage points on average, revealing a new dimension for scaling effectiveness and efficiency.

Abstract

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.

Distributed Systems & Hardware Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Streaming Communication in Multi-Agent Reasoning

Related Papers