Search papers, labs, and topics across Lattice.
The paper introduces AggAgent, a novel method for aggregating parallel trajectories in long-horizon agentic tasks by treating them as an environment navigable by a lightweight agent. AggAgent uses tools to inspect candidate solutions and synthesize information across trajectories, overcoming limitations of naive aggregation methods like final-answer-only or full concatenation. Experiments across six benchmarks and three model families demonstrate that AggAgent outperforms existing aggregation methods by up to 5.3% on average and 10.3% on deep research tasks, while maintaining low overhead.
Stop throwing away valuable trajectory data: a new agent can intelligently synthesize information from parallel agent rollouts, boosting performance on long-horizon tasks by up to 10%.
We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique challenges: trajectories are long, multi-turn, and tool-augmented, and outputs are often open-ended. Aggregating only final answers discards rich information from trajectories, while concatenating all trajectories exceeds the model's context window. To address this, we propose AggAgent, an aggregation agent that treats parallel trajectories as an environment. We equip it with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand. Across six benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5), AggAgent outperforms all existing aggregation methods-by up to 5.3% absolute on average and 10.3% on two deep research tasks-while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout. Our findings establish agentic aggregation as an effective and cost-efficient approach to parallel test-time scaling.