Search papers, labs, and topics across Lattice.
Galbot
6
70
5
10
By cleverly anchoring diffusion sampling near plausible solutions and adding a lightweight residual correction, AnchorVLA achieves robust mobile manipulation with significantly reduced inference costs.
A single spatial token, learned via occupancy prediction on a massive dataset, is surprisingly effective at injecting crucial spatial awareness into vision-language navigation, leading to state-of-the-art performance.
Forget hand-crafted heuristics: this new dynamics-aware policy learns to exploit contact forces in cluttered environments, outperforming traditional methods by 25% in simulation and showing impressive sim-to-real transfer.
Achieve more realistic and physically plausible scene reconstructions from video by explicitly optimizing viewpoints for object generation and synthesizing scene graphs within a 3D simulator.
Training a robot foundation model on 30,000 hours of heterogeneous embodied data lets it outperform prior methods by up to 48% on complex manipulation tasks and even benefit from low-quality data.
Forget painstakingly labeled real-world data – GraspVLA proves you can train a surprisingly capable grasping foundation model on a billion frames of purely synthetic action data.