Search papers, labs, and topics across Lattice.
X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, China
5
0
7
6
Finally, a speech tokenizer that doesn't require extra optimization tricks to work robustly for both generation and understanding tasks in a unified architecture.
Identity-preserving video generation just got a whole lot more faithful: FaithfulFaces maintains identity even under extreme pose variations and occlusions, a feat previous methods struggled with.
Adversarial training doesn't have to hurt speaker verification: by explicitly modeling language, you can disentangle speaker and language characteristics without sacrificing speaker discriminability.
Forget expensive audio-text data collection: TASU2 lets you dial in the perfect amount of noise for training your speech LLM, all from text.
G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.