Search papers, labs, and topics across Lattice.
Hujing Digital Media and Entertainment Group
1
0
3
Speaker diarization in movies and TV shows just got a whole lot better, thanks to a new multimodal framework that uses visual cues, speech, and subtitles to handle the chaos of open-world video.