Search papers, labs, and topics across Lattice.
This paper investigates the impact of chain-of-thought (CoT) reasoning on uncertainty quantification (UQ) in vision-language models (VLMs). It finds that CoT prompting and reasoning-trained models degrade the quality of most uncertainty estimates due to implicit answer conditioning, where token probabilities reflect consistency with the reasoning trace rather than actual uncertainty. However, agreement-based consistency measures remain robust and improve with reasoning, offering a practical alternative for UQ in reasoning-enabled VLMs.
Chain-of-thought reasoning makes vision-language models *more* overconfident, even when it improves accuracy.
Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.