Search papers, labs, and topics across Lattice.
3
0
5
12
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.
Real-world speech disfluencies trip up even the most advanced full-duplex voice agents, exposing critical gaps in self-correction and multi-step reasoning abilities.
Text-only LLMs already contain surprisingly diverse levels of auditory knowledge, and this pre-existing knowledge strongly predicts their performance when adapted for audio-language tasks.