CMU MLBITDUTXiaohongshuMay 26, 2026arXiv:2605.27030

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

Xinglin Wang, Hao Lin, Shaoxiong Feng, Peiwen Yuan, Yiwei Li, Jiayi Shi, Yueqi Zhang, Chuyi Tan, Ji Zhang, Boyuan Pan, Yao Hu, Kan Li

AI Summary

The paper introduces Collaborative Parallel Thinking (CPT), a novel training-free inference framework for test-time scaling (TTS) that enables information sharing across parallel search branches in LLMs. CPT extracts and deduplicates intermediate information from branches into a shared pool, broadcasting it to other branches via the input context to reduce redundant exploration. Experiments on HMMT and AIME demonstrate that CPT achieves a superior accuracy-latency trade-off compared to isolated parallel search methods.

Key Contribution

LLMs can reason more efficiently by sharing intermediate thoughts during parallel search, achieving better accuracy with less computation.

Abstract

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search steps to collect complete decision information needed to reach correct answers. To bridge this gap, we propose Collaborative Parallel Thinking (CPT), a training-free inference framework that enables search-time information sharing across parallel branches. CPT extracts compact intermediate information from ongoing branches, maintains a deduplicated query-level information pool, and broadcasts pool entries through the input context, allowing each branch in subsequent search steps to reuse discoveries made by other branches rather than rediscover the same information. Empirically, experiments on HMMT and AIME benchmarks show that CPT establishes a stronger accuracy--latency Pareto frontier than strong baselines across rollout budgets and model scales, highlighting search-time collaboration as an effective direction for efficient parallel TTS.

Distributed Systems & Hardware Inference & Quantization Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...