Search papers, labs, and topics across Lattice.
This paper introduces Selective Latent Thinking (SLT), a method for adaptively compressing LLM reasoning chains by selectively encoding redundant spans into latent representations while preserving critical steps in explicit chain-of-thought (CoT) form. SLT uses a confidence-based gating mechanism, guided by a lightweight decoder predicting upcoming reasoning spans, to determine which spans can be reliably compressed. Experiments on mathematical reasoning benchmarks show SLT achieves significantly higher accuracy than existing latent reasoning methods at similar compression ratios, while also substantially reducing reasoning chain length compared to explicit CoT.
LLMs can compress their reasoning by over 50% with minimal accuracy loss by selectively applying latent compression only to redundant steps.
Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large language models (LLMs), but incurs high inference cost due to lengthy autoregressive traces. Existing latent reasoning methods offer a promising alternative, yet they often treat reasoning as uniformly compressible, causing precision-critical intermediate steps to be overly compressed and thereby degrading reasoning accuracy. In this work, we propose Selective Latent Thinking (SLT), a framework that selectively compresses redundant reasoning spans into latent representations while preserving precision-critical spans as explicit CoT within the same reasoning trajectory. Specifically, SLT first uses a lightweight decoder to anticipate a short upcoming reasoning span, and then applies confidence-based gating to determine the longest span that can be reliably compressed. The accepted span is encoded into a compact latent representation to improve reasoning efficiency, while uncertain or precision-critical reasoning remains in explicit CoT form to preserve accuracy. To learn this selective compression policy, SLT adopts a three-stage training strategy that combines span-level latent compression, reliability-aware future reasoning prediction, and trajectory-level reinforcement learning to optimize the trade-off between answer correctness and reasoning cost. Extensive experiments across four mathematical reasoning benchmarks demonstrate that SLT achieves 22.7\% higher accuracy than latent reasoning baselines at comparable compression ratios, while reducing reasoning chain length by 58.4\% with only 2.8\% accuracy degradation compared to explicit CoT,Our code can be found in https://github.com/hunshi34/SLT.