Search papers, labs, and topics across Lattice.
Tianjin University
4
0
6
Continuous-target modeling reveals a shared semantic mapping for ASR and S2TT, challenging conventional views on their error sources.
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
Unlock scalable, high-quality singing voice synthesis by directly generating structured musical scores from audio, outperforming existing systems on multiple datasets.
Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.