Apr 15, 2026arXiv:2604.13660

VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

Hui Han, Shunli Wang, Yandan Zhao, Taiping Yao, Shouhong Ding

AI Summary

This paper introduces VRAG-DFD, a framework that uses Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL) to improve MLLM-based deepfake detection. They address the lack of forgery knowledge in existing MLLMs by creating a Forensic Knowledge Database (FKD) and a Forensic Chain-of-Thought Dataset (F-CoT) for RAG. VRAG-DFD achieves state-of-the-art performance on deepfake detection generalization testing through a three-stage training process (Alignment -> SFT -> GRPO) that cultivates critical reasoning.

Key Contribution

Injecting retrieved forensic knowledge and chain-of-thought reasoning via RL can significantly boost MLLMs' ability to generalize in deepfake detection.

Abstract

In Deepfake Detection (DFD) tasks, researchers proposed two types of MLLM-based methods: complementary combination with small DFD detectors, or static forgery knowledge injection.The lack of professional forgery knowledge hinders the performance of these DFD-MLLMs.To solve this, we deeply considered two insightful issues: How to provide high-quality associated forgery knowledge for MLLMs? AND How to endow MLLMs with critical reasoning abilities given noisy reference information? Notably, we attempted to address above two questions with preliminary answers by leveraging the combination of Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL).Through RAG and RL techniques, we propose the VRAG-DFD framework with accurate dynamic forgery knowledge retrieval and powerful critical reasoning capabilities.Specifically, in terms of data, we constructed two datasets with RAG: Forensic Knowledge Database (FKD) for DFD knowledge annotation, and Forensic Chain-of-Thought Dataset (F-CoT), for critical CoT construction.In terms of model training, we adopt a three-stage training method (Alignment->SFT->GRPO) to gradually cultivate the critical reasoning ability of the MLLM.In terms of performance, VRAG-DFD achieved SOTA and competitive performance on DFD generalization testing.

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

Related Papers