Search papers, labs, and topics across Lattice.
5
0
8
5
Speaker-attributed ASR gets a serious boost from jointly training speaker cluster tags within a speech-aware LLM, outperforming traditional pipelines.
Despite advances in expressive speech, current TTS systems often miss subtle but crucial contextual cues, failing to emphasize the correct words even when the context makes the intended meaning clear.
Skip reinforcement learning and still get SOTA vision-language reasoning performance with a simple loss re-weighting scheme that cuts training time by 7x.
LLM-based ASR can be sped up by 4.4x with minimal accuracy loss by using a CTC encoder to speculatively generate draft transcriptions.
Ditch slow, sequential decoding: NLE achieves 27x speedup over autoregressive ASR by using a non-autoregressive, LLM-based transcript editing approach.