AmiiUAlbertaFeb 16, 2026arXiv:2602.14468

LACONIC: Length-Aware Constrained Reinforcement Learning for LLM

Chang Liu, Lawrence Liu, Yaoqi Ye, Csaba Szepesvári, Lin F. Yang

AI Summary

The paper introduces LACONIC, a reinforcement learning method for LLMs that enforces a target token budget during training by augmenting the RL objective with a length-based cost. The cost scale is adaptively adjusted to balance brevity and task performance, providing a more robust approach to length control than fixed heuristic reward shaping. Experiments on mathematical reasoning models demonstrate that LACONIC preserves or improves pass@1 while reducing output length by over 50%, and maintains out-of-domain performance with significantly fewer tokens.

Key Contribution

Get 50% shorter LLM responses without sacrificing accuracy using a new RL method that dynamically balances task reward and length constraints.

Abstract

Reinforcement learning (RL) has enhanced the capabilities of large language models (LLMs) through reward-driven training. Nevertheless, this process can introduce excessively long responses, inflating inference latency and computational overhead. Prior length-control approaches typically rely on fixed heuristic reward shaping, which can misalign with the task objective and require brittle tuning. In this work, we propose LACONIC, a reinforcement learning method that enforces a target token budget during training. Specifically, we update policy models using an augmented objective that combines the task reward with a length-based cost. To balance brevity and task performance, the cost scale is adaptively adjusted throughout training. This yields robust length control while preserving task reward. We provide a theoretical guarantee that support the method. Across mathematical reasoning models and datasets, LACONIC preserves or improves pass@1 while reducing output length by over 50%. It maintains out-of-domain performance on general knowledge and multilingual benchmarks with 44% fewer tokens. Moreover, LACONIC integrates into standard RL-tuning with no inference changes and minimal deployment overhead.

Inference & Quantization RLHF & Preference Learning Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LACONIC: Length-Aware Constrained Reinforcement Learning for LLM

Related Papers