IIS Academia SinicaNTU TaiwanNYCUMar 5, 2026arXiv:2603.05310

Latent-Mark: An Audio Watermark Robust to Neural Resynthesis

Yen-Shan Chen, Shih-Yu Lai, Ying-Jung Tsou, Yi-Cheng Lin, Bing-Yu Chen, Yun-Nung Chen, Hung-yi Lee, Shang-Tse Chen

AI Summary

Latent-Mark, a novel zero-bit audio watermarking framework, embeds watermarks in the latent space of audio codecs to achieve robustness against neural resynthesis, a weakness of prior methods. The method optimizes audio waveforms to induce detectable directional shifts in the encoded latent representation while maintaining imperceptibility. Cross-Codec Optimization is introduced to prevent overfitting by jointly optimizing across multiple surrogate codecs, leading to state-of-the-art zero-shot transferability to unseen codecs and resilience against DSP attacks.

Key Contribution

Audio watermarks can now survive neural resynthesis, thanks to a latent space embedding technique that resists semantic compression by modern audio codecs.

Abstract

While existing audio watermarking techniques have achieved strong robustness against traditional digital signal processing (DSP) attacks, they remain vulnerable to neural resynthesis. This occurs because modern neural audio codecs act as semantic filters and discard the imperceptible waveform variations used in prior watermarking methods. To address this limitation, we propose Latent-Mark, the first zero-bit audio watermarking framework designed to survive semantic compression. Our key insight is that robustness to the encode-decode process requires embedding the watermark within the codec's invariant latent space. We achieve this by optimizing the audio waveform to induce a detectable directional shift in its encoded latent representation, while constraining perturbations to align with the natural audio manifold to ensure imperceptibility. To prevent overfitting to a single codec's quantization rules, we introduce Cross-Codec Optimization, jointly optimizing the waveform across multiple surrogate codecs to target shared latent invariants. Extensive evaluations demonstrate robust zero-shot transferability to unseen neural codecs, achieving state-of-the-art resilience against traditional DSP attacks while preserving perceptual imperceptibility. Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.

Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References51

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Latent-Mark: An Audio Watermark Robust to Neural Resynthesis

Related Papers