Search papers, labs, and topics across Lattice.
The paper introduces Stepwise Adaptive Thinking (SAT), a framework for dynamically adjusting the reasoning chain length of Large Reasoning Models (LRMs) based on step-level difficulty. SAT uses a Process Reward Model (PRM) to navigate a Finite-State Machine (FSM) with different thinking modes (Slow, Normal, Fast, Skip), pruning easy steps and preserving depth for harder ones. Experiments across 9 LRMs and 7 benchmarks demonstrate that SAT reduces reasoning tokens by up to 40% while maintaining or improving accuracy.
LRMs can slash up to 40% of reasoning tokens without sacrificing accuracy by dynamically adjusting their "thinking speed" at each step.
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive"overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.