BaiduHKUQMULXiamen UniversityXJTUMay 27, 2026arXiv:2605.28713

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Guoxin Ma, Chengzhengxu Li, Yu Liang, Yueyang Zhang, Kecheng Chen, Zhaohan Zhang, Zhiyuan Sun

AI Summary

This paper introduces Thinking as Compression (TaC), a novel context compression paradigm that leverages the intrinsic capabilities of LLMs to compress long contexts by generating task-relevant "thinking traces." TaC-C, a refined version of TaC, uses a reward-driven optimization framework to elicit compact and controllable compressed context, addressing budget control and shortcut behaviors. Experiments on long-context QA benchmarks demonstrate that TaC-C significantly outperforms existing compression methods, achieving up to 23.4% higher F1 score and 21.7% higher Exact Match Score at high compression ratios.

Key Contribution

LLMs can compress context better than dedicated compression modules, simply by prompting them to "think" about the task.

Abstract

Context compression aims to shorten long context inputs with minimal information loss for LLM inference acceleration. While existing methods have shown promise, they typically rely on complex compression modules or compression-specific training, leaving the intrinsic capabilities of LLMs underexplored. In contrast, this work reveals that a thinking model itself can naturally compress long contexts by organizing task-relevant information. We thus derive Thinking as Compression (TaC), a new compression paradigm that treats thinking itself as compressed context. Without relying on specific dedicated compressor, TaC directly prompts the thinking model to generate thinking traces as the shortened context, already outperforming most representative compression methods. Further, given that raw thinking output may struggle with budget control and shortcut behaviors, we introduce Thinking as Compression Constrained (TaC-C), leveraging a simple reward-driven optimization framework to elicit intrinsic thinking as compact and controllable compressed context. Experiments across four long-context QA benchmarks demonstrate that TaC-C consistently outperforms existing baselines. At 4x and 8x compression ratios, it surpasses the strongest competitor by 17.4% and 23.4% in average F1, and by 15.7% and 21.7% in average Exact Match Score (EM), respectively.

Inference & Quantization Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Related Papers