Search papers, labs, and topics across Lattice.
3
0
6
6
Pruning 90% of visual tokens without sacrificing performance could revolutionize the efficiency of 3D scene understanding in multimodal models.
Latent reasoning can beat explicit Chain-of-Thought – but only if you force it to learn causal dynamics via a visual world model, not just language.
Training a single point cloud encoder across diverse 3D domains not only improves perception but also unlocks emergent behaviors and enhances robotic manipulation and spatial reasoning.