Search papers, labs, and topics across Lattice.
The paper introduces Progressive Thought Encoding, a parameter-efficient fine-tuning method for large reasoning models (LRMs) that addresses the memory bottleneck during reinforcement learning (RL) training caused by long rollouts and autoregressive decoding. This method encodes intermediate reasoning steps into fixed-size vector representations, eliminating the need to backpropagate through full-cache rollouts and maintaining constant memory usage during inference. Experiments on mathematical benchmarks demonstrate that Progressive Thought Encoding significantly improves reasoning accuracy compared to LoRA-based fine-tuning and models without fine-tuning, while also enhancing training efficiency under tight memory constraints.
Forget full-cache rollouts: this parameter-efficient fine-tuning method lets large reasoning models maintain accuracy while slashing memory usage during RL training.
Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, on six widely used challenging mathematical benchmarks show consistent gains: our method achieves +19.3% improvement over LoRA-based fine-tuning and +29.9% over LRMs without fine-tuning on average, with up to +23.4 accuracy improvement on AIME2024/2025 under the same tight cache budgets. These results demonstrate that Progressive Thought Encoding not only improves reasoning accuracy but also makes RL training of LRMs substantially more efficient and scalable under real-world memory constraints.