Apple MLApr 13, 2026arXiv:2604.11720

On the Robustness of Watermarking for Autoregressive Image Generation

Andreas Müller, A. Muller, Denis Lukovnikov, Denis Lukovnikov, Shingo Kodama, Shingo Kodama, Minh Pham, Minh Pham, Anubhav Jain, Jonathan Petit, Niv Cohen, Niv Cohen, Asja Fischer, Asja Fischer

AI Summary

This paper analyzes the robustness of watermarking schemes for autoregressive image generators, finding them vulnerable to removal and forgery attacks. The authors introduce three new attacks: vector-quantized regeneration, adversarial optimization, and frequency injection, demonstrating their effectiveness with limited access to a single watermarked image. The results show that current watermarking methods fail to reliably detect synthetic content and are susceptible to watermark mimicry, potentially excluding authentic images from training datasets.

Key Contribution

Watermarks meant to identify AI-generated images can be easily removed or forged, even allowing attackers to falsely flag real images as AI-generated.

Abstract

The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to original model parameters or watermarking secrets. Our findings indicate that existing watermarking schemes for AR image generation do not reliably support synthetic content detection for dataset filtering. Moreover, they enable Watermark Mimicry, whereby authentic images can be manipulated to imitate a generator's watermark and trigger false detection to prevent their inclusion in future model training.

Computer Vision Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

On the Robustness of Watermarking for Autoregressive Image Generation

Related Papers