Search papers, labs, and topics across Lattice.
This paper introduces DiffAnon, a diffusion-based voice anonymization method leveraging classifier-free guidance to offer continuous control over prosody preservation during inference. By refining acoustic details over semantic embeddings from an RVQ codec, DiffAnon enables smooth interpolation between anonymization strength and prosodic fidelity. Experiments demonstrate a structured trade-off between utility and privacy, achieving competitive privacy while preserving prosodic information at controllable operating points.
Finally, voice anonymization offers a smooth, tunable knob to balance privacy and prosody, instead of forcing you to pick just one.
To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism to control the utility-privacy trade-off, operating at fixed design points. We propose DiffAnon, a diffusion-based anonymization method with classifier-free guidance (CFG) that provides explicit, continuous inference-time control over prosody preservation. DiffAnon refines acoustic detail over semantic embeddings of an RVQ codec, enabling smooth interpolation between anonymization strength and prosodic fidelity within a single model. To the best of our knowledge, it is the first voice anonymization framework to provide structured, interpolatable inference-time prosody control. Experiments demonstrate structured trade-off behavior, achieving strong utility while maintaining competitive privacy across controllable operating points.