Mar 9, 2026arXiv:2603.07887

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Dylan J. Foster, Akshay Krishnamurthy

AI Summary

This paper introduces a theoretical framework, based on particle filtering and Sequential Monte Carlo (SMC), to analyze the accuracy-cost tradeoffs of inference-time sampling methods for large language models. The authors establish non-asymptotic guarantees for SMC under specific criteria, propose algorithmic improvements, and identify a fundamental limit inherent to particle filtering. Empirical results validate the theoretical criteria's influence on sampling error but highlight the need for additional theoretical considerations to fully explain final accuracy.

Key Contribution

Particle filtering reveals a fundamental limit to inference-time sampling methods for LLMs, suggesting that simply increasing the number of samples has diminishing returns.

Abstract

Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of *particle filtering* algorithms such as Sequential Monte Carlo (SMC). Given a base language model and a *process reward model* estimating expected terminal rewards, we ask: *how accurately can we sample from a target distribution given some number of process reward evaluations?* Theoretically, we identify (1) simple criteria enabling non-asymptotic guarantees for SMC; (2) algorithmic improvements to SMC; and (3) a fundamental limit faced by all particle filtering methods. Empirically, we demonstrate that our theoretical criteria effectively govern the *sampling error* of SMC, though not necessarily its final *accuracy*, suggesting that theoretical perspectives beyond sampling may be necessary.

Inference & Quantization Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

Related Papers