Search papers, labs, and topics across Lattice.
University of Science and Technology Beijing
3
0
4
Current deepfake detectors fall flat when faced with forged listening reactions, but a new motion- and audio-aware network can spot the subtle tells.
Even when visual data is missing or noisy, EgoAdapt accurately determines who is talking to the camera wearer by adaptively integrating head orientation, lip movement, and robust audio features.
CueNet achieves robust audio-visual speaker extraction under visual degradation by cleverly disentangling and integrating speaker information, acoustic synchronisation, and semantic synchronisation cues, without needing training on degraded visual data.