Search papers, labs, and topics across Lattice.
MAIS, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
1
0
3
Speaker diarization in movies and TV shows just got a whole lot better, thanks to a new multimodal framework that uses visual cues, speech, and subtitles to handle the chaos of open-world video.