Search papers, labs, and topics across Lattice.
3
0
5
48
Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.
A simple resampling strategy closes the "Thinking-Acting Gap" in agentic VLMs, enabling smaller models to outperform larger ones on multimodal reasoning tasks.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.