Search papers, labs, and topics across Lattice.
University, University of Southern California, Seoul National University
4
0
6
MLLMs struggle to generalize in Video Temporal Grounding not just due to unseen concepts, but because visual domain shift breaks their ability to link temporal localization with entity attention – a problem EVIDENT solves by explicitly routing adaptation through visual entity evidence.
Natural images aren't Euclidean—modeling them on a hypersphere unlocks better generative performance.
RIS models struggle with motion-based queries, but a new data augmentation and contrastive learning approach closes the gap without sacrificing performance on appearance-based descriptions.
Ditch the noisy diffusion detour: ASBM finds a surprisingly direct route from data to noise, slashing sampling steps and boosting image fidelity.