Corresponding authorsMar 31, 2026arXiv:2603.29405

Hallucination-aware intermediate representation edit in large vision-language models

Wei Suo, Hanzu Zhang, Lijun Zhang, Ji Ma, Peng Wang, Yanning Zhang

AI Summary

This paper introduces a novel framework, Hallucination-aware Intermediate Representation Edit (HIRE), to mitigate object hallucination in large vision-language models (LVLMs) by dynamically detecting and editing hallucination-prone intermediate representations. HIRE identifies these representations by comparing the model's output distribution with a distribution conditioned on visual grounding. By applying targeted edits to these representations, HIRE achieves state-of-the-art performance on hallucination benchmarks with minimal computational overhead compared to retraining or contrastive decoding.

Key Contribution

Correcting a vision-language model's "hallucinations" is now as simple as pinpointing and editing the right intermediate representation, sidestepping costly retraining or dual inference.

Abstract

Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD methods introduce dual inference overhead. These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. With minimal additional computational cost, we achieve state-of-the-art performance on existing benchmarks. Extensive experiments demonstrate the effectiveness of our approach, highlighting its efficient and robust hallucination elimination capability and its powerful controllability over hallucinations. Code is available at https://github.com/ASGO-MM/HIRE

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References64

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hallucination-aware intermediate representation edit in large vision-language models

Related Papers