Jun 16, 2026arXiv:2606.17687

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

Jiahao Wang, Bingyu Liang, Chenhao Hu, Longhui Zhang, Xuebo Liu, Min Zhang, Jing Li, Xuelong Li

AI Summary

This paper introduces a novel framework called Sufficiency-guided Continuous Adaptive Reasoning (SuCo) that addresses the inefficiencies of Large Reasoning Models (LRMs) by employing a Minimal Sufficient Chain-of-Thought (MSC) approach. By defining MSC as the shortest necessary reasoning trajectory for accurate answers, the authors demonstrate that their method not only reduces the number of reasoning tokens used but also enhances accuracy across various task difficulties. The two-stage training process—comprising MSC-Aligned Fine-Tuning and Sufficiency-Aware Policy Optimization—shows significant improvements in reasoning efficiency and accuracy in extensive evaluations across mathematics, code, and science benchmarks.

Key Contribution

Reducing reasoning tokens while boosting accuracy, SuCo transforms how LRMs approach problem-solving by focusing on sufficiency rather than excess.

Abstract

Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this inefficiency typically rely on discrete reasoning modes or fixed budget tiers, lacking a principled criterion of when reasoning is sufficient. In this work, we introduce Minimal Sufficient CoT (MSC), defined as the shortest prefix of a CoT trajectory which is adequate for producing the correct answer. We empirically show that MSC not only reduces reasoning tokens, but also improves accuracy across difficulty levels. Building on MSC, we propose Sufficiency-guided Continuous Adaptive Reasoning (SuCo), a two-stage training framework for autonomous reasoning control along a continuous spectrum. In stage 1, MSC-Aligned Fine-Tuning (MFT) constructs MSC data using problem-adaptive sufficiency thresholds that naturally scale with question difficulty, then fine-tunes the model to internalize concise yet sufficient reasoning patterns. In stage 2, Sufficiency-Aware Policy Optimization (SAPO) further optimizes the model through reinforcement learning with dynamic complexity tracking and sufficiency-aware rewards that penalize both over- and under-thinking. Extensive experiments across mathematics, code, and science benchmarks show that SuCo consistently achieves improvements in both accuracy and reasoning efficiency.

Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

Related Papers