Search papers, labs, and topics across Lattice.
1
0
3
0
Current vision-language models struggle with instance-level reasoning, but InstAP grounds textual mentions to specific spatial-temporal regions, unlocking a new level of fine-grained understanding.