Search papers, labs, and topics across Lattice.
1
0
3
11
Integrating visual cues into a long-context ASR system slashes word error rate by 16% in multi-talker conversational recordings, proving the power of AV fusion.