Search papers, labs, and topics across Lattice.
3
0
5
3
SLMs that seem safe with text inputs can completely fail when the same content is spoken, revealing a critical "speech grounding gap" in current models.
Forget complex disentanglement architectures or low-quality synthetic targets: MimicLM achieves superior voice imitation by cleverly using synthetic speech as the *source* and real speech as the *target* in a pseudo-parallel training setup.
Standardized evaluation of nonverbal vocalizations in TTS is now possible with NV-Bench, a new benchmark that treats NVs as communicative acts, not just acoustic artifacts.