Search papers, labs, and topics across Lattice.
3
1
5
5
Current speech generation models still fall short in maintaining consistency and capturing nuanced expressiveness when generating long-form speech, despite advances in high-fidelity synthesis.
VoxMind drastically improves task completion rates in spoken dialogue agents, jumping from 34.88% to 74.57%, even surpassing Gemini-2.5-Pro, by integrating "Think-before-Speak" reasoning and asynchronous tool management.
Reinforcement learning can now be practically applied to spoken dialogue models thanks to a new post-training recipe that disentangles semantic and acoustic improvements.