Search papers, labs, and topics across Lattice.
2
0
4
13
Semantic masks are all you need: predicting mask dynamics in world models yields surprisingly robust and generalizable robot policies compared to predicting raw pixels.
By decoupling camera and manipulation actions and training them in a coordinated manner, SaPaVe achieves significantly higher success rates in real-world robotic manipulation tasks compared to existing end-to-end vision-language-action models.