Search papers, labs, and topics across Lattice.
This paper demonstrates that reconstruction-based detectors for diffusion-generated images are highly vulnerable to adversarial perturbations, causing detection accuracy to plummet to near zero with imperceptible changes to the input. The authors systematically evaluate three representative detectors across four generative models, showing that white-box attacks are effective and transferrable in black-box settings. They further demonstrate that standard adversarial defenses offer limited protection due to the low signal-to-noise ratio of attacked samples.
Reconstruction-based AI-generated image detectors, despite their promise, can be completely fooled by adding imperceptible noise, revealing a critical security flaw.
Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, reconstruction-based methods have emerged as a prominent paradigm for this task. However, we find that such methods exhibit severe security vulnerabilities to adversarial perturbations; that is, by adding imperceptible adversarial perturbations to input images, the detection accuracy of classifiers collapses to near zero. To verify this threat, we present a systematic evaluation of the adversarial robustness of three representative detectors across four diverse generative backbone models. First, we construct adversarial attacks in white-box scenarios, which degrade the performance of all well-trained detectors. Moreover, we find that these attacks demonstrate transferability; specifically, attacks crafted against one detector can be transferred to others, indicating that adversarial attacks on detectors can also be constructed in a black-box setting. Finally, we assess common countermeasures and find that standard defense methods against adversarial attacks provide limited mitigation. We attribute these failures to the low signal-to-noise ratio (SNR) of attacked samples as perceived by the detectors. Overall, our results reveal fundamental security limitations of reconstruction-based detectors and highlight the need to rethink existing detection strategies.