Search papers, labs, and topics across Lattice.
Brno University of Technology
2
0
4
Integrating visual cues into a long-context ASR system slashes word error rate by 16% in multi-talker conversational recordings, proving the power of AV fusion.
Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.