UNSWFeb 12, 2026arXiv:2602.11639

PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning

Ruixiang Feng, Silin Zhou, Ke Shi, Ran Le, Zhenwei An, Zongchao Chen, Guangyue Peng, Dongsheng Wang, Lisi Chen, Yang Song, Shuo Shang

AI Summary

The paper introduces PACE, a dual-level framework for compressing reasoning traces in Language Reasoning Models (LRMs) by addressing overthinking and excessive token usage. PACE employs prefix-protected optimization at the sequence level using decaying mixed rollouts to preserve valid reasoning paths while encouraging conciseness, and difficulty-aware penalty at the group level to dynamically adjust length constraints based on query complexity. Experiments on DeepSeek-R1-Distill-Qwen models (1.5B/7B) demonstrate that PACE achieves up to 55.7% token reduction and up to 4.1% accuracy improvement on math benchmarks, generalizing to code, science, and general domains.

Key Contribution

Achieve up to 55.7% token reduction and 4.1% accuracy improvement in language reasoning by selectively compressing reasoning traces, proving that less can be more.

Abstract

Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from ``overthinking'', producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose \textbf{\model}, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that \model achieves a substantial reduction in token usage (up to \textbf{55.7\%}) while simultaneously improving accuracy (up to \textbf{4.1\%}) on math benchmarks, with generalization ability to code, science, and general domains.

Inference & Quantization Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient Reasoning

Related Papers