Search papers, labs, and topics across Lattice.
Department of Electrical and Computer Engineering, Northeastern University
3
0
4
Video LLMs don't just get details wrong, they fundamentally distort motion and fabricate entire events, demanding a new approach to evaluation and mitigation.
Unimodal models might already understand each other better than we thought: a shared relational structure, formalized via category theory, unlocks zero-shot cross-modal alignment.
MLLMs that ace standard Referring Expression Comprehension benchmarks still stumble when faced with images designed to eliminate shortcuts, revealing a surprising lack of robust visual reasoning.