Search papers, labs, and topics across Lattice.
1
52
3
4
Unlock human-like spatial reasoning in VLMs with VLM-3R, which reconstructs 3D understanding from monocular video using instruction tuning, bypassing the need for external depth sensors.