Search papers, labs, and topics across Lattice.
3
0
6
0
Current vision-language models struggle with process understanding in robotic manipulation, but targeted post-training can yield significant improvements.
Revisiting spatial reasoning allows models to correct initial hypotheses with new perspectives, dramatically enhancing their accuracy in complex environments.
Achieve real-time embodied manipulation with large 3D vision models using a novel asynchronous architecture that boosts success rates by up to 51.4% while simultaneously reducing inference time.