Search papers, labs, and topics across Lattice.
2
0
3
2
Ditch the separate models: CAST-TTS uses a single cross-attention mechanism to control TTS timbre from both speech and text, rivaling specialized models in quality.
Ditching VAE acoustic latents for semantic latents unlocks more semantically meaningful audio generation, outperforming traditional methods on AudioCaps.