Search papers, labs, and topics across Lattice.
2
0
4
MLLMs can "think" with images, but their actions often don't match their reasoning, and this paper solves that with a new training method that forces them to explain what they see.
VLMs still can't reason about spatial logic in real-world scenes, but a new benchmark and scene graph method shows how to make progress.