Search papers, labs, and topics across Lattice.
2
0
4
4
Ditching mel-spectrograms unlocks surprisingly better text-to-speech, as LongCat-AudioDiT proves that waveform latent diffusion can beat the state-of-the-art in zero-shot voice cloning.
LongCat-Next shatters the language-centric paradigm by unifying text, vision, and audio into a single autoregressive model with minimal modality-specific design, finally reconciling understanding and generation in discrete vision modeling.