Search papers, labs, and topics across Lattice.
This paper introduces Min-$k$ sampling, a novel decoding strategy for LLMs that dynamically truncates the logit distribution based on local shape analysis to identify semantic cliffs. Unlike Top-$k$ or Top-$p$ sampling, Min-$k$ is temperature invariant and robust to hyperparameter choices, addressing a key limitation of existing probability-space truncation methods. Empirical results across reasoning, creative writing, and human evaluations demonstrate that Min-$k$ consistently improves text quality, especially under extreme temperature settings.
Forget temperature tuning: Min-$k$ sampling finds the "semantic cliff" in your LLM's logits, delivering robust and high-quality text even when other methods fall apart.
The quality of text generated by large language models depends critically on the decoding sampling strategy. While mainstream methods such as Top-$k$, Top-$p$, and Min-$p$ achieve a balance between diversity and accuracy through probability-space truncation, they share an inherent limitation: extreme sensitivity to the temperature parameter. Recent logit-space approaches like Top-$n蟽$ achieve temperature invariance but rely on global statistics that are susceptible to long-tail noise, failing to capture fine-grained confidence structures among top candidates. We propose \textbf{Min-$k$ Sampling}, a novel dynamic truncation strategy that analyzes the local shape of the sorted logit distribution to identify "semantic cliffs": sharp transitions from high-confidence core tokens to uncertain long-tail tokens. By computing a position-weighted relative decay rate, Min-$k$ dynamically determines truncation boundaries at each generation step. We formally prove that Min-$k$ achieves strict temperature invariance and empirically demonstrate its low sensitivity to hyperparameter choices. Experiments on multiple reasoning benchmarks, creative writing tasks, and human evaluation show that Min-$k$ consistently improves text quality, maintaining robust performance even under extreme temperature settings where probability-based methods collapse. We make our code, models, and analysis tools publicly available.