UCSDJun 2, 2026arXiv:2606.03965

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Yu Xia, Zhouhang Xie, Xin Xu, Byungkyu Kang, Prarit Lamba, Xiangbo Gao, Julian McAuley

AI Summary

This paper introduces Agentic Chain-of-Thought Steering (ACTS), a novel framework that formulates reasoning steering as a Markov decision process, allowing a controller agent to adaptively guide a frozen reasoner during inference. By utilizing synthetic steering trajectories and optimizing through reinforcement learning, ACTS achieves significant token savings while maintaining high accuracy in reasoning tasks. Experimental results demonstrate that ACTS not only matches the performance of full-thinking approaches but also provides controllable trade-offs between accuracy and efficiency across various benchmarks.

Key Contribution

ACTS achieves full-thinking performance with up to 40% fewer tokens, enabling precise control over reasoning efficiency and accuracy.

Abstract

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit. In this paper, we propose Agentic Chain-of-Thought Steering (ACTS), which formulates reasoning steering as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference. At each step, the controller observes the reasoning trace and remaining thinking budget, then issues a steering action consisting of a reasoning strategy and a steering phrase that initiates the next reasoner step. This enables budget-aware strategy control for efficient reasoning while preserving the reasoner's generation continuity. We initialize the controller agent from our constructed synthetic steering trajectories with multi-budget augmentation, and further optimize it via reinforcement learning with budget-conditioned reward shaping. Experiments across multiple benchmarks show that ACTS matches full-thinking performance with substantial token savings, and enables controllable accuracy-efficiency trade-offs across different reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.

Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References51

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Related Papers