Search papers, labs, and topics across Lattice.
This paper introduces a novel framework called Sufficiency-guided Continuous Adaptive Reasoning (SuCo) that addresses the inefficiencies of Large Reasoning Models (LRMs) by employing a Minimal Sufficient Chain-of-Thought (MSC) approach. By defining MSC as the shortest necessary reasoning trajectory for accurate answers, the authors demonstrate that their method not only reduces the number of reasoning tokens used but also enhances accuracy across various task difficulties. The two-stage training process鈥攃omprising MSC-Aligned Fine-Tuning and Sufficiency-Aware Policy Optimization鈥攕hows significant improvements in reasoning efficiency and accuracy in extensive evaluations across mathematics, code, and science benchmarks.
Reducing reasoning tokens while boosting accuracy, SuCo transforms how LRMs approach problem-solving by focusing on sufficiency rather than excess.
Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this inefficiency typically rely on discrete reasoning modes or fixed budget tiers, lacking a principled criterion of when reasoning is sufficient. In this work, we introduce Minimal Sufficient CoT (MSC), defined as the shortest prefix of a CoT trajectory which is adequate for producing the correct answer. We empirically show that MSC not only reduces reasoning tokens, but also improves accuracy across difficulty levels. Building on MSC, we propose Sufficiency-guided Continuous Adaptive Reasoning (SuCo), a two-stage training framework for autonomous reasoning control along a continuous spectrum. In stage 1, MSC-Aligned Fine-Tuning (MFT) constructs MSC data using problem-adaptive sufficiency thresholds that naturally scale with question difficulty, then fine-tunes the model to internalize concise yet sufficient reasoning patterns. In stage 2, Sufficiency-Aware Policy Optimization (SAPO) further optimizes the model through reinforcement learning with dynamic complexity tracking and sufficiency-aware rewards that penalize both over- and under-thinking. Extensive experiments across mathematics, code, and science benchmarks show that SuCo consistently achieves improvements in both accuracy and reasoning efficiency.