Search papers, labs, and topics across Lattice.
This paper introduces a novel approach to Inference-Time Scaling (ITS) that leverages intrinsic statistics of parallel sample sets, specifically length-adjusted tail entropy, to enhance solution quality in complex tasks without relying on ground truth. By employing Intrinsic Selection (iS) and Intrinsic Particle Filtering (iPF), the authors demonstrate significant improvements in candidate ranking and reasoning trajectories, achieving a 20% increase in engineering design selection and a 6.1-point boost in pass@1 metrics on challenging math problems. Additionally, Particle Distillation (dPF) further refines this process, yielding up to 26.5% gains in generating complex clinical responses, thus extending ITS capabilities to open-ended domains.
Length-adjusted tail entropy can serve as a powerful signal for solution quality, enabling significant performance boosts in inference-time scaling across complex domains.
Inference-Time Scaling (ITS) has largely succeeded in verifiable domains like math and coding, where cheap verification enables scalable output selection. However, extending ITS to tasks prone to systematic failure - driven by faulty initial assumptions or unmet multidimensional constraints - typically relies on costly external solvers or brittle, model-based verifiers. Our key insight is that the intrinsic statistics of parallel sample sets, specifically length-adjusted tail entropy, provide a robust discriminative signal for solution quality without access to ground truth. Crucially, these statistics serve as a difficulty gate for adaptive compute allocation, dynamically routing problems across scaling regimes. First, Intrinsic Selection (iS) ranks candidates post-hoc, matching consensus-based algorithms across three domains and improving engineering design selection by 20% over pass@1 baselines. Second, Intrinsic Particle Filtering (iPF) generalizes this to step-level resampling, guiding generation toward high-confidence reasoning trajectories to improve pass@1 by 6.1 points on average on hard math problems. Finally, Particle Distillation (dPF) injects privileged guidance via early logit blending and KL-guided resampling, steering generation past systematic reasoning errors to satisfy expert rubrics, yielding up to 26.5% gains on complex clinical responses. Our pipeline applies seamlessly across broad-purpose, domain-specialized, and multimodal architectures, successfully extending ITS to open-ended domains without requiring trained reward models or exact ground-truth verification.