Search papers, labs, and topics across Lattice.
Rice University
1
0
3
Current visual grounding models struggle to infer objects from contextual roles and intentions, highlighting a critical gap in their ability to perform true scene understanding.