Search papers, labs, and topics across Lattice.
2
0
4
43
Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.