Search papers, labs, and topics across Lattice.
6
1
8
5
Turns out, your image-generating diffusion model already knows how to segment anything you ask it to.
VoxMind drastically improves task completion rates in spoken dialogue agents, jumping from 34.88% to 74.57%, even surpassing Gemini-2.5-Pro, by integrating "Think-before-Speak" reasoning and asynchronous tool management.
Fine-grained reward signals for semantic quality and interaction timing unlock more human-like spoken dialogue models.
Reinforcement learning can now be practically applied to spoken dialogue models thanks to a new post-training recipe that disentangles semantic and acoustic improvements.
Current reward models for spoken dialogue systems are missing crucial paralinguistic and natural speech elements, but this new model closes the gap by operating directly on speech and outperforming existing audio LLMs.
WavBench exposes the limitations of current spoken dialogue models in handling real-world conversational nuances like colloquialisms and paralinguistics, despite advances in reasoning capabilities.