Search papers, labs, and topics across Lattice.
1
0
3
1
CueNet achieves robust audio-visual speaker extraction under visual degradation by cleverly disentangling and integrating speaker information, acoustic synchronisation, and semantic synchronisation cues, without needing training on degraded visual data.