Search papers, labs, and topics across Lattice.
5
70
5
10
A single spatial token, learned via occupancy prediction on a massive dataset, is surprisingly effective at injecting crucial spatial awareness into vision-language navigation, leading to state-of-the-art performance.
Forget hand-crafted heuristics: this new dynamics-aware policy learns to exploit contact forces in cluttered environments, outperforming traditional methods by 25% in simulation and showing impressive sim-to-real transfer.
Reconstructing scenes from video for simulation is now more realistic thanks to a pipeline that optimizes viewpoints for object generation and uses scene graphs to ensure physical plausibility.
Training a robot foundation model on 30,000 hours of heterogeneous embodied data lets it outperform prior methods by up to 48% on complex manipulation tasks and even benefit from low-quality data.
Forget painstakingly labeled real-world data – GraspVLA proves you can train a surprisingly capable grasping foundation model on a billion frames of purely synthetic action data.