Mar 16, 2026arXiv:2603.15557

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Lexiang Xiong, Qi Li, Jingwen Ye, Xinchao Wang

AI Summary

This paper introduces a multi-stage diagnostic framework for tracing hallucinations in VLMs by modeling generation as a dynamic cognitive trajectory within a Cognitive State Space. The core finding is a "geometric-information duality," where geometric abnormality in the space corresponds to high information-theoretic surprisal, enabling hallucination detection as a geometric anomaly detection problem. Evaluated on QA, reasoning, and captioning tasks, the framework achieves SOTA performance with weak supervision and robustness to calibration data contamination, allowing for causal attribution of failures to distinct pathological states.

Key Contribution

Hallucinations in VLMs aren't just errors, they're traceable pathologies in a model's cognitive trajectory, diagnosable via geometric anomalies in a learned cognitive state space.

Abstract

Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Related Papers