Search papers, labs, and topics across Lattice.
3
0
5
Long-form speech generation can now achieve remarkable coherence and naturalness without the need for extensive retraining on long-form datasets.
By decoupling patch details from semantics, Cheers achieves state-of-the-art multimodal performance at 20% of the training cost of comparable models.
Time-shifted anechoic speech beats early reflections as a training target for universal speech enhancement, leading to better perceptual quality and ASR performance.