ETHAI Center TübingenELLISMax PlanckTübingenApr 22, 2026arXiv:2604.20500

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Johannes Zenn, Guinan Su, Mrinmaya Sachan, Jonas Geiping

AI Summary

The paper introduces Distinct Leaf Enumeration (DLE), a deterministic decoding method for efficient test-time inference that systematically enumerates distinct leaves of a pruned decoding tree instead of sampling with replacement. DLE improves inference efficiency by increasing coverage of the search space and reusing shared prefixes, leading to less redundant token generation. Experiments on math, coding, and general reasoning tasks demonstrate that DLE explores higher-quality reasoning traces and achieves better performance compared to stochastic self-consistency.

Key Contribution

Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.

Abstract

Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces than stochastic self-consistency, yielding better performance on math, coding, and general reasoning tasks.

Code Generation & Program Synthesis Inference & Quantization Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Related Papers