Search papers, labs, and topics across Lattice.
College of Computer Science and Technology, National University of Defense Technology, State Key Laboratory of Complex & Critical Software Environment
2
0
5
MLLMs don't just forget language, they also suffer from perceptual drift in cross-modal spaces, but MAny offers a training-free merging strategy to fix both.
Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.