Search papers, labs, and topics across Lattice.
1
0
3
MLLMs that ace standard Referring Expression Comprehension benchmarks still stumble when faced with images designed to eliminate shortcuts, revealing a surprising lack of robust visual reasoning.