Search papers, labs, and topics across Lattice.
Corresponding author are Bo Cheng and Soujanya Poria
2
0
5
78
By explicitly grounding reasoning steps to visual objects, Chain-of-Glimpse enables more accurate and interpretable video understanding, outperforming object-agnostic methods on multiple benchmarks.
Current VLMs can ace image quizzes, but completely fumble when asked to stack blocks in a physically plausible way, revealing a critical gap in understanding real-world physics.