Aug 14, 2025arXiv:2508.10264

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge, Yiwei Wang, Mingchao Yang, Yujun Cai

AI Summary

This paper introduces Multi-Region Fusion Decoding (MRFD), a novel training-free decoding method designed to mitigate hallucinations in Large Vision-Language Models (LVLMs) by explicitly modeling inter-region consistency. MRFD leverages cross-attention to identify salient image regions, generates region-specific responses, and then uses Jensen-Shannon Divergence (JSD) to compute reliability weights that guide a consistency-aware fusion of these predictions. Experiments demonstrate that MRFD significantly reduces hallucinations and improves response factuality across various LVLMs and benchmarks without requiring any model training or fine-tuning.

Key Contribution

LVLMs can be made more truthful without retraining: a new decoding strategy fuses region-specific predictions based on their consistency, slashing hallucinations.

Abstract

Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations -- text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality without requiring model updates.

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations6

Influential citations0

References44

Year2025

VenueConference on Empirical Methods in Natural Language Processing

Related Papers

Finding related papers...

Search

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Related Papers