Search papers, labs, and topics across Lattice.
This paper introduces Multi-modal Evidence Grounding (MEG), a novel metric that quantifies the semantic contribution of retrieved multimodal evidence in Retrieval-Augmented Generation (RAG) systems by focusing on high-IDF tokens. Based on MEG, they propose MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. Experiments on the M$^2$RAG benchmark demonstrate that MEG-RAG outperforms strong baselines by prioritizing high-value content based on semantic grounding, improving accuracy and multimodal consistency.
Semantic grounding, not token probability, is the key to better multimodal RAG.
Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. By prioritizing high-value content based on semantic grounding rather than token probability distributions, MEG-RAG improves the accuracy and multimodal consistency of generated outputs. Extensive experiments on the M$^2$RAG benchmark show that MEG-RAG consistently outperforms strong baselines and demonstrates robust generalization across different teacher models.