Mar 11, 2026arXiv:2603.10504

Na\"ive Exposure of Generative AI Capabilities Undermines Deepfake Detection

Sunpill Kim, Chan-Hue Hwang, Minsu Kim, Jae Hong Seo

AI Summary

This paper investigates the vulnerability of deepfake detectors to images refined by generative AI systems using policy-compliant prompts. It finds that generative AI's ability to articulate and externalize authenticity criteria allows adversaries to refine deepfakes to evade detection while preserving identity and improving perceptual quality. The study highlights that commercial chatbot services pose a greater risk than open-source models due to their superior realism and semantic controllability.

Key Contribution

Generative AI's ability to reason about and refine images based on authenticity criteria inadvertently creates a powerful evasion strategy that renders current deepfake detectors ineffective.

Abstract

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the na\"ive exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References75

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Na\"ive Exposure of Generative AI Capabilities Undermines Deepfake Detection

Related Papers