Search papers, labs, and topics across Lattice.
This paper investigates multi-stream audio watermarking where individual audio stems are watermarked before mixing, separated, and then decoded. They find that naively combining existing watermarking and separation techniques performs poorly due to separation artifacts. To address this, they propose a joint training framework for the watermarking system and the separator, leading to significant improvements in watermark recovery after separation.
Jointly training audio watermarking and source separation unlocks robust multi-stream watermarking, enabling independent tracking of individual audio components within a mix.
Modern audio is created by mixing stems from different sources, raising the question: can we independently watermark each stem and recover all watermarks after separation? We study a separation-first, multi-stream watermarking framework-embedding distinct information into stems using unique keys but a shared structure, mixing, separating, and decoding from each output. A naive pipeline (robust watermarking + off-the-shelf separation) yields poor bit recovery, showing robustness to generic distortions does not ensure robustness to separation artifacts. To enable this, we jointly train the watermark system and the separator in an end-to-end manner, encouraging the separator to preserve watermark cues while adapting embedding to separation-specific distortions. Experiments on speech+music and vocal+accompaniment mixtures show substantial gains in post-separation recovery while maintaining perceptual quality.