Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University, Shanghai Innovation Institute
3
0
6
8
Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.
Unlock SOTA audio understanding by jointly training on readily available clip-level descriptions and scarce frame-level annotations, bridging the gap between global semantics and local details.
Achieve human-like full-duplex voice interactions with SoulX-Duplug, a plug-and-play module that slashes latency and improves turn management by acting as a semantic VAD.