Search papers, labs, and topics across Lattice.
This paper introduces a novel framework called probabilistic programs of thought that enhances the efficiency of code generation and mathematical reasoning tasks by leveraging the distribution of next-token probabilities from LLMs. Instead of generating multiple samples through expensive GPU computations, the proposed method allows for the representation of exponentially many deterministic programs from a single generated program, significantly reducing the computational burden. The results demonstrate that this approach yields improved performance across various benchmarks while minimizing the number of required LLM generations.
By transforming LLM outputs into probabilistic programs, this approach slashes the computational cost of generating multiple code samples without sacrificing quality.
LLMs are widely used for code generation and mathematical reasoning tasks where they are required to generate structured output. They either need to reason about code, generate code for a given specification, or reason using programs of thought. The typical approach to code generation is to prompt the model and generate samples until an appropriate program is obtained. Within this process, sampling $n$ programs from the language model requires $n$ GPU compute-intensive generations which becomes prohibitively expensive for larger values of $n$. In this work, we address this limitation by exposing the LLM's distribution within the generated programs themselves. We propose a novel test-time framework we dub probabilistic programs of thought to obtain more samples from the model with fewer LLM generations. Given a program generated by a model and the associated next-token probabilities, we build a probabilistic program that compactly represents exponentially many deterministic programs. Since performing probabilistic reasoning in this probabilistic program is much cheaper, our approach allows sampling new programs without any additional GPU compute and little CPU overhead. We instantiate our approach on benchmarks for code generation, code understanding and mathematical reasoning and report improvements in performance with fewer generations from the LLM.