Search papers, labs, and topics across Lattice.
The paper introduces Top-b, a novel decoding strategy for autoregressive language models that dynamically adjusts the candidate set size based on the Shannon entropy of the model's probability distribution at each step. Top-b regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution, formalizing generation as a trajectory through a relative probability manifold. Theoretical analysis shows Top-b minimizes variance in the tail distribution, and empirical results on GPQA and GSM8K demonstrate reduced generation entropy and variance with maintained reasoning accuracy.
By dynamically adjusting the candidate set size based on Shannon entropy, Top-b offers a more nuanced approach to decoding that balances exploration and exploitation, outperforming static truncation methods.
Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.