Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong, Tencent Ethereal Audio Lab
3
0
4
Forget flat, lifeless speech: this model uses self-critique to generate expressive speech rivaling GPT-4o-Audio, even with significantly less training data.
Achieve accent normalization with interpretable and controllable accent strength by selectively reusing self-supervised speech tokens via masked discrete diffusion.
Forget collecting real L2 speech data: this accent normalization method trains on synthetic L2 speech generated from text, achieving better content preservation and naturalness than models trained on real data.