Search papers, labs, and topics across Lattice.
2
0
4
0
Visual cues become crucial for speech recognition when audio quality tanks in this challenging new benchmark derived from real-world conversations.
Achieve more natural and synchronized video dubbing by conditioning a discrete flow matching TTS model on facial expressions and cross-modal alignment.