Search papers, labs, and topics across Lattice.
1
0
3
Ditching mel-spectrograms unlocks surprisingly better text-to-speech, as LongCat-AudioDiT proves that waveform latent diffusion can beat the state-of-the-art in zero-shot voice cloning.