RUCTencent AIApr 14, 2026arXiv:2604.12781

Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

Haoyang Jiang, Mingyang Yi, Shaolei Zhang, Junxian Cai, Qingbin Liu, Ju Fan

AI Summary

This paper demonstrates that reconstruction-based detectors for diffusion-generated images are highly vulnerable to adversarial perturbations, causing detection accuracy to plummet to near zero with imperceptible changes to the input. The authors systematically evaluate three representative detectors across four generative models, showing that white-box attacks are effective and transferrable in black-box settings. They further demonstrate that standard adversarial defenses offer limited protection due to the low signal-to-noise ratio of attacked samples.

Key Contribution

Reconstruction-based AI-generated image detectors, despite their promise, can be completely fooled by adding imperceptible noise, revealing a critical security flaw.

Abstract

Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, reconstruction-based methods have emerged as a prominent paradigm for this task. However, we find that such methods exhibit severe security vulnerabilities to adversarial perturbations; that is, by adding imperceptible adversarial perturbations to input images, the detection accuracy of classifiers collapses to near zero. To verify this threat, we present a systematic evaluation of the adversarial robustness of three representative detectors across four diverse generative backbone models. First, we construct adversarial attacks in white-box scenarios, which degrade the performance of all well-trained detectors. Moreover, we find that these attacks demonstrate transferability; specifically, attacks crafted against one detector can be transferred to others, indicating that adversarial attacks on detectors can also be constructed in a black-box setting. Finally, we assess common countermeasures and find that standard defense methods against adversarial attacks provide limited mitigation. We attribute these failures to the low signal-to-noise ratio (SNR) of attacked samples as perceived by the detectors. Overall, our results reveal fundamental security limitations of reconstruction-based detectors and highlight the need to rethink existing detection strategies.

Computer Vision Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

Related Papers