Feb 24, 2026arXiv:2602.20680

Vanishing Watermarks: Diffusion-Based Image Editing Undermines Robust Invisible Watermarking

Fan Guo, Jiyu Kang, Qi Ming, Qi Ming, Emily Davis, Finn Carter

AI Summary

This paper investigates the vulnerability of robust invisible watermarking schemes to diffusion-based image editing, demonstrating that diffusion models can effectively erase watermarks designed to withstand conventional distortions. Through theoretical analysis, the authors prove that sufficient diffusion transformations reduce the mutual information between the watermarked image and the hidden payload, leading to decoding failure. Empirically, they show that diffusion edits, especially with a guided attack, reduce watermark recovery rates to near-zero across state-of-the-art methods like StegaStamp, TrustMark, and VINE, while preserving image quality.

Key Contribution

Diffusion models can completely obliterate even the most robust invisible watermarks, rendering them useless for protecting against generative model-based image manipulation.

Abstract

Robust invisible watermarking schemes aim to embed hidden information into images such that the watermark survives common manipulations. However, powerful diffusion-based image generation and editing techniques now pose a new threat to these watermarks. In this paper, we present a comprehensive theoretical and empirical analysis demonstrating that diffusion models can effectively erase robust watermarks even when those watermarks were designed to withstand conventional distortions. We show that a diffusion-driven image regeneration process, which leverages generative models to recreate an image, can remove embedded watermarks while preserving the image's perceptual content. Furthermore, we introduce a guided diffusion-based attack that explicitly targets the embedded watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion transformations, the mutual information between the watermarked image and the hidden payload approaches zero, leading to inevitable decoding failure. Experimentally, we evaluate multiple state-of-the-art watermarking methods (including deep learning-based schemes like StegaStamp, TrustMark, and VINE) and demonstrate that diffusion edits yield near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new strategies to ensure watermark resilience in the era of powerful diffusion models.

Computer Vision Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References103

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Vanishing Watermarks: Diffusion-Based Image Editing Undermines Robust Invisible Watermarking

Related Papers