Search papers, labs, and topics across Lattice.
The University of Tokyo
2
0
5
MLLMs get personality right half the time for the wrong reasons, revealing a massive "Prejudice Gap" where models fail to ground their judgments in observable behavior.
Current vision-language models struggle with instance-level reasoning, but InstAP grounds textual mentions to specific spatial-temporal regions, unlocking a new level of fine-grained understanding.