Feb 24, 2026arXiv:2602.21441

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

Shiwei Tan, Hengyi Wang, Weiyi Qin, Qi Xu, Zhigang Hua

AI Summary

This paper introduces a causal decoding framework to mitigate object hallucination in Multimodal Large Language Models (MLLMs). The framework applies targeted causal interventions during the decoding process to reduce the generation of spurious object mentions by attenuating spurious dependencies. Experiments on captioning and QA benchmarks demonstrate that this approach significantly reduces object hallucination rates while preserving output quality, achieving state-of-the-art faithfulness.

Key Contribution

By surgically intervening in MLLM decoding, this work cuts hallucination rates without sacrificing descriptive quality, a feat prior methods struggled to achieve.

Abstract

Multimodal Large Language Models (MLLMs) deliver detailed responses on vision-language tasks, yet remain susceptible to object hallucination (introducing objects not present in the image), undermining reliability in practice. Prior efforts often rely on heuristic penalties, post-hoc correction, or generic decoding tweaks, which do not directly intervene in the mechanisms that trigger object hallucination and thus yield limited gains. To address this challenge, we propose a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions. By reshaping the decoding dynamics to attenuate spurious dependencies, our approach reduces false object tokens while maintaining descriptive quality. Across captioning and QA benchmarks, our framework substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

Related Papers