Search papers, labs, and topics across Lattice.
4
0
6
11
Cinematic speech data unlocks more realistic and controllable voice generation from natural language descriptions.
Achieve controllable and scalable speech generation with MOSS-TTS, enabling zero-shot voice cloning and long-form synthesis.
A purely Transformer-based audio tokenizer, pre-trained on 3M hours of data, leapfrogs existing codecs and even enables a fully autoregressive TTS model to outperform cascaded systems.
Open-source MOVA lets you generate synchronized, high-quality video and audio—including realistic lip sync—without relying on closed-source systems.