Mar 3, 2026arXiv:2603.02882

SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

Xinjie Zhu, Zijing Zhao, Hui Jin, Qingxiao Guo, Yilong Ma, Yunhao Wang, Xiaobing Guo, Weifeng Zhang

AI Summary

The paper introduces SIGMark, a novel in-generation watermarking framework for video diffusion models that addresses the scalability and robustness limitations of existing methods. SIGMark achieves blind extraction by generating watermarked initial noise using a global set of frame-wise pseudo-random coding keys (GF-PRC), eliminating the need to store message-key pairs. The framework also incorporates a segment group-ordering (SGO) module tailored to causal 3D VAEs to enhance robustness against temporal disturbances.

Key Contribution

In-generation watermarking for video diffusion just got practical: SIGMark achieves blind extraction with high bit-accuracy and robustness against temporal and spatial disturbances, all while minimizing overhead.

Abstract

Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.

Computer Vision Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

Related Papers