Search papers, labs, and topics across Lattice.
3
0
6
11
Achieve real-time, high-fidelity speech synthesis with a 48ms latency by directly generating compressed audio codec tokens, bypassing the neural vocoder bottleneck.
Hallucination in abstractive summarization? Injecting named entities into the decoder input, along with multimodal embeddings, can keep your French BART model grounded.
Now a single speech foundation model can generate diverse utterance-level representations, like semantics and speaker identity, opening new possibilities for multimodal and multilingual applications.