BaiduShenzhen UniversityMar 18, 2026arXiv:2603.17449

TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

Tingcheng Bian, Jinchang Luo, Mingquan Cheng, Jinyu Zhang, Xiaoling Xia, Ni Li, Yan Tao, Haiwei Wang

AI Summary

This paper introduces Minimal Sufficient Length (MSL), a theoretical metric representing the shortest reasoning length required for correct answers from LLMs. They provide a recursive definition for MSL based on independently sampled sequences and prove its existence, establishing a measurable lower bound for reasoning chain compression. Based on this, they propose TRiMS, an RL-based method using the GRPO algorithm and MSL estimation to achieve over 80% CoT token reduction with a slight accuracy increase.

Key Contribution

LLMs can slash over 80% of their chain-of-thought tokens with a minor accuracy boost, thanks to a new RL-based method that targets the "Minimal Sufficient Length" of reasoning.

Abstract

Large language models achieve breakthroughs in complex reasoning via long chain-of-thought sequences. However, this often leads to severe reasoning inflation, causing substantial computational redundancy. To maximize Intelligence per Token, we introduce a theoretical metric, MSL-Minimal Sufficient Length. MSL rigorously characterizes the shortest reasoning length that preserves answer correctness. We provide a recursive definition based on independently sampled sequences and prove the existence of its limit, establishing the first measurable lower bound for reasoning-chain compression. Building on an analysis of mainstream CoT compression strategies, we identify key structural factors enabling a model to approach MSL. Based on these insights, we propose TRiMS which employs the GRPO algorithm in conjunction with MSL-based estimation during training, while mitigating instabilities during the training process through dynamic batch aggregation and advantage computation using batch-level standard deviation. TRiMS achieves over 80% CoT token reduction with a minor accuracy boost across all benchmarks.

Reasoning & Chain-of-Thought Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

Related Papers