Search papers, labs, and topics across Lattice.
5
0
5
LLMs can outperform humans in predicting the next speaker in meetings, even without audio or visual data.
Dixtral achieves up to 29% absolute improvement in speaker-attributed transcription accuracy by leveraging diarization masks without risking catastrophic forgetting.
Anticipating dialogue endpoints up to 2.56 seconds ahead can slash latency by over half while enhancing computational efficiency in real-time speech interactions.
A groundbreaking dataset of 313 hours of real-world code-switched speech reveals rich patterns and frequencies previously overlooked in multilingual research.
Incremental speech quality assessment can be dramatically improved by modeling it as a multi-resolution task, achieving a 48% reduction in error on partial audio inputs.