Search papers, labs, and topics across Lattice.
6
1
8
5
Turns out, diffusion models aren't just for generating images; they're surprisingly good at understanding them too, achieving SOTA segmentation with no architectural changes.
VoxMind drastically improves task completion rates in spoken dialogue agents, jumping from 34.88% to 74.57%, even surpassing Gemini-2.5-Pro, by integrating "Think-before-Speak" reasoning and asynchronous tool management.
Fine-grained reward signals for semantic quality and interaction timing unlock more human-like spoken dialogue models.
Reinforcement learning can now be practically applied to spoken dialogue models thanks to a new post-training recipe that disentangles semantic and acoustic improvements.
Current reward models for spoken dialogue systems are missing crucial paralinguistic and natural speech elements, but this new model closes the gap by operating directly on speech and outperforming existing audio LLMs.
WavBench exposes the limitations of current spoken dialogue models in handling real-world conversational nuances like colloquialisms and paralinguistics, despite advances in reasoning capabilities.