Search papers, labs, and topics across Lattice.
This paper reframes decoding in language models as a principled optimization problem on the probability simplex, where the objective balances model score with structural preferences. The authors demonstrate that common decoding strategies like greedy decoding, Top-K, and Top-P can be derived as special cases within this framework. They introduce Best-of-K (BoK), a novel KL-anchored coverage objective for multi-sample pipelines, and show that it improves accuracy on tasks like MATH500.
Forget heuristic knob-tuning: decoding algorithms are just regularized optimization problems hiding in plain sight.
Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off model score against structural preferences and constraints. This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases, and explains their common structure through optimality conditions. More importantly, the framework makes it easy to invent new decoders without folklore. We demonstrate this by designing Best-of-K (BoK), a KL-anchored coverage objective aimed at multi-sample pipelines (self-consistency, reranking, verifier selection). BoK targets the probability of covering good alternatives within a fixed K-sample budget and improves empirical performance. We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.