Search papers, labs, and topics across Lattice.
2
0
5
0
Multimodal models can now handle audio natively with improved efficiency, achieving state-of-the-art results in complex tasks like document understanding and agentic computer use.
Real-world speech disfluencies trip up even the most advanced full-duplex voice agents, exposing critical gaps in self-correction and multi-step reasoning abilities.