PKUUniversity of International Business and EconomicsUSCApr 20, 2026arXiv:2604.18112

Retrieval-Augmented Multimodal Model for Fake News Detection

AI Summary

The paper introduces Retrieval-Augmented Multimodal Model (RAMM) to address the challenges of cross-instance narrative consistency and lack of domain-specific knowledge in multimodal fake news detection. RAMM uses a Multimodal Large Language Model (MLLM) backbone with an Abstract Narrative Alignment Module to extract narrative consistency and a Semantic Representation Alignment Module to mimic human-like analogical reasoning. Experiments on three public datasets demonstrate the effectiveness of RAMM.

Key Contribution

By mimicking human reasoning through instance-based analogy, RAMM significantly improves multimodal fake news detection, especially in data-scarce domains.

Abstract

In recent years, multimodal multidomain fake news detection has garnered increasing attention. Nevertheless, this direction presents two significant challenges: (1) Failure to Capture Cross-Instance Narrative Consistency: existing models usually evaluate each news in isolation, fail to capture cross-instance narrative consistency, and thus struggle to address the spread of cluster based fake news driven by social media; (2) Lack of Domain Specific Knowledge for Reasoning: conventional models, which rely solely on knowledge encoded in their parameters during training, struggle to generalize to new or data-scarce domains (e.g., emerging events or niche topics). To tackle these challenges, we introduce Retrieval-Augmented Multimodal Model for Fake News Detection (RAMM). First, RAMM employs a Multimodal Large Language Model (MLLM) as its backbone to capture cross-modal semantic information from news samples. Second, RAMM incorporates an Abstract Narrative Alignment Module. This component adaptively extracts abstract narrative consistency from diverse instances across distinct domains, aggregates relevant knowledge, and thereby enables the modeling of high-level narrative information. Finally, RAMM introduces a Semantic Representation Alignment Module, which aligns the model's decision-making paradigm with that of humans - specifically, it shifts the model's reasoning process from direct inference on multimodal features to an instance-based analogical reasoning process. Extensive experimental results on three public datasets validate the efficacy of our proposed approach. Our code is available at the following link: https://github.com/li-yiheng/RAMM

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References52

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Retrieval-Augmented Multimodal Model for Fake News Detection

Related Papers