Search papers, labs, and topics across Lattice.
3
0
5
0
Endowing VLMs with intrinsic 3D geometric awareness and physical interaction cues via XEmbodied substantially boosts performance on spatial reasoning and embodied tasks, surpassing existing 2D image-text pretrained models.
Latent reasoning can beat explicit Chain-of-Thought – but only if you force it to learn causal dynamics via a visual world model, not just language.
Autonomous driving models can now achieve remarkable zero-shot generalization by leveraging the power of large-scale video generation models to jointly predict future actions and visuals.