Search papers, labs, and topics across Lattice.
University, University of Southern California, Seoul National University
2
0
4
MLLMs struggle to generalize in Video Temporal Grounding not just due to unseen concepts, but because visual domain shift breaks their ability to link temporal localization with entity attention – a problem EVIDENT solves by explicitly routing adaptation through visual entity evidence.
Video-LLMs can ace complex video understanding but still fail at telling if something is moving left or right, revealing a surprising blind spot in their perceptual abilities.