Search papers, labs, and topics across Lattice.
3
0
6
7
Current audio-visual models nail unimodal quality but still struggle to make music and dance move together rhythmically, highlighting a key gap TMD-Bench is designed to address.
Turns out, your image-generating diffusion model already knows how to segment anything you ask it to.
Current reward models for spoken dialogue systems are missing crucial paralinguistic and natural speech elements, but this new model closes the gap by operating directly on speech and outperforming existing audio LLMs.