Mar 17, 2026arXiv:2603.16728

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

AI Summary

This paper investigates the impact of chain-of-thought (CoT) reasoning on uncertainty quantification (UQ) in vision-language models (VLMs). It finds that CoT prompting and reasoning-trained models degrade the quality of most uncertainty estimates due to implicit answer conditioning, where token probabilities reflect consistency with the reasoning trace rather than actual uncertainty. However, agreement-based consistency measures remain robust and improve with reasoning, offering a practical alternative for UQ in reasoning-enabled VLMs.

Key Contribution

Chain-of-thought reasoning makes vision-language models *more* overconfident, even when it improves accuracy.

Abstract

Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

Related Papers