HITMar 18, 2026arXiv:2603.17761

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Yuxin Liu, Fei Wang, Yiqi Nie, Junjie Chen, Zhangling Duan, Zhaohong Jia

AI Summary

This paper introduces Semantic Consistent Evidence Pack (SCEP), a training-free framework for adapting large vision-language models (LVLMs) to image deepfake detection. SCEP identifies a compact set of suspicious image patches by clustering patch features based on semantic mismatch with the CLS token and frequency/noise anomalies. These patches are then used as evidence to condition a frozen LVLM for deepfake classification, achieving strong performance without fine-tuning.

Key Contribution

Forget fine-tuning: this method uses smart patch selection to adapt frozen LVLMs for deepfake detection, outperforming baselines without any training.

Abstract

Image Deepfake Detection (IDD) separates manipulated images from authentic ones by spotting artifacts of synthesis or tampering. Although large vision-language models (LVLMs) offer strong image understanding, adapting them to IDD often demands costly fine-tuning and generalizes poorly to diverse, evolving manipulations. We propose the Semantic Consistent Evidence Pack (SCEP), a training-free LVLM framework that replaces whole-image inference with evidence-driven reasoning. SCEP mines a compact set of suspicious patch tokens that best reveal manipulation cues. It uses the vision encoder's CLS token as a global reference, clusters patch features into coherent groups, and scores patches with a fused metric combining CLS-guided semantic mismatch with frequency-and noise-based anomalies. To cover dispersed traces and avoid redundancy, SCEP samples a few high-confidence patches per cluster and applies grid-based NMS, producing an evidence pack that conditions a frozen LVLM for prediction. Experiments on diverse benchmarks show SCEP outperforms strong baselines without LVLM fine-tuning.

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Related Papers