Search papers, labs, and topics across Lattice.
This paper introduces an adaptive conformal prediction method for LLMs that improves the factuality of generations by providing prompt-dependent uncertainty estimates. The approach extends conformal score transformation methods to enable prompt-dependent calibration, improving conditional coverage while retaining marginal coverage guarantees. Experiments on long-form generation and multiple-choice question answering with white-box models demonstrate significant outperformance compared to existing baselines.
LLMs can be made more reliably factual by adapting uncertainty estimates to the specific prompt, boosting conditional coverage without sacrificing overall accuracy.
Large language models (LLMs) are prone to generating factually incorrect outputs. Recent work has applied conformal prediction to provide uncertainty estimates and statistical guarantees for the factuality of LLM generations. However, existing approaches are typically not prompt-adaptive, limiting their ability to capture input-dependent variability. As a result, they may filter out too few items (leading to over-coverage) or too many (under-coverage) for a given task or prompt. We propose an adaptive conformal prediction approach that extends conformal score transformation methods to LLMs, with applications to long-form generation and multiple-choice question answering. This enables prompt-dependent calibration, retaining marginal coverage guarantees while improving conditional coverage. In addition, the approach naturally supports selective prediction, allowing unreliable claims or answer choices to be filtered out in downstream applications. We evaluate our approach on multiple white-box models across diverse domains and show that it significantly outperforms existing baselines in terms of conditional coverage.