Search papers, labs, and topics across Lattice.
This paper addresses the problem of chunk ordering in Chain-of-Agents (CoA) frameworks for long-context reasoning, where the order of processing input chunks affects performance due to information bottlenecks. They propose using Chow-Liu trees to learn a dependency structure between chunks and prioritize strongly related chunks during processing. Experiments on three long-context benchmarks demonstrate that breadth-first traversal of the Chow-Liu tree outperforms default and semantic-based chunk ordering in terms of answer relevance and exact-match accuracy.
Chain-of-Agents can reason more accurately over long contexts by processing information chunks in an order determined by Chow-Liu dependency trees, rather than relying on default or semantic similarity.
Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a probabilistic perspective, CoA aims to approximate the conditional distribution corresponding to a model capable of jointly reasoning over the entire long context. CoA achieves this through a latent-state factorization in which only bounded summaries of previously processed evidence are passed between agents. The resulting bounded-memory approximation introduces a lossy information bottleneck, making the final evidence state inherently dependent on the order in which chunks are processed. In this work, we study the problem of chunk ordering for long-context reasoning. We use the well-known Chow-Liu trees to learn a dependency structure that prioritizes strongly related chunks. Empirically, we show that a breadth-first traversal of the resulting tree yields chunk orderings that reduce information loss across agents and consistently outperform both default document-chunk ordering and semantic score-based ordering in answer relevance and exact-match accuracy across three long-context benchmarks.