Search papers, labs, and topics across Lattice.
This paper introduces Entropy-Gated Latent Recursion (EGLR), a novel decoding method that enhances language model reasoning by leveraging a second axis of deterministic sampling based on the layer span at which decoder layers are reapplied. By varying the layer span \(L\) in conjunction with temperature sampling \(T\), EGLR creates a Cartesian sampling space that significantly improves the diversity of rollouts without the need for stochasticity. The results demonstrate that this approach yields substantial performance gains on math reasoning tasks, achieving up to 91.6% accuracy, which is a notable improvement over traditional methods that rely solely on stochastic sampling.
Layer span variation can unlock a new dimension of deterministic rollout diversity, boosting performance by over 10 percentage points on reasoning tasks.
Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span $L$ at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of $L$ produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-$L$ layers for at most $K_{\max}$ iterations until the next-token distribution converges. Combined with $T$ temperature samples, EGLR turns a single-axis stochastic rollout pool into an $L\times T$ Cartesian sampling space at almost the same per-rollout cost. We characterize this space across $8$ instruction-tuned models and $6$ math reasoning benchmarks, and show that the $L$-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reaches $91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle ($83.4\%$) and $+10.4$ points beyond the layer-only oracle ($81.2\%$), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-$N$ with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.