Stanford HAIApr 9, 2026arXiv:2604.08369

Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

AI Summary

This paper introduces TrACE, a training-free adaptive compute controller for LLM agents that dynamically allocates LLM calls based on inter-rollout action agreement. TrACE samples candidate actions at each step and increases compute only when agreement is low, indicating a difficult decision. Experiments on GSM8K and MiniHouse using a Qwen 2.5 3B model show TrACE matches the accuracy of fixed-budget self-consistency while using significantly fewer LLM calls (33-65% reduction).

Key Contribution

LLMs can decide when they need more "thinking time" – and boost their accuracy while slashing compute costs by up to 65% – simply by checking if they agree with themselves.

Abstract

Inference-time compute scaling has emerged as a powerful technique for improving the reliability of large language model (LLM) agents, but existing methods apply compute uniformly: every decision step receives the same budget regardless of its difficulty. We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement. At each step, TrACE samples a small set of candidate next actions and measures how consistently the model commits to the same action. High agreement signals an easy decision; the controller commits immediately. Low agreement signals uncertainty; the controller samples additional rollouts up to a configurable cap before committing to the plurality action. No learned components, no external verifier, and no human labels are required. We evaluate TrACE against greedy decoding and fixed-budget self-consistency (SC-4, SC-8) on two benchmarks spanning single-step reasoning (GSM8K, n=50) and multi-step household navigation (MiniHouse, n=30), using a Qwen 2.5 3B Instruct model running on CPU. TrACE-4 matches SC-4 accuracy while using 33% fewer LLM calls on GSM8K and 39% fewer on MiniHouse. TrACE-8 matches SC-8 accuracy with 55% fewer calls on GSM8K and 65% fewer on MiniHouse. We further show that inter-rollout agreement is a reliable signal of step-level success, validating the core hypothesis that the model's own output consistency encodes difficulty information that can be exploited without training. TrACE is the first training-free, per-timestep adaptive-compute controller for LLM agents to be evaluated on multi-step sequential decision tasks.

Inference & Quantization Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References16

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

Related Papers