Search papers, labs, and topics across Lattice.
This paper investigates Keypoint Imitation Learning (KIL) for robotic manipulation, focusing on design choices and generalization capabilities compared to RGB-based and diffusion-based methods. They systematically evaluate KIL across five real-world tasks using over 2000 rollouts, exploring the impact of different design choices. Results show KIL significantly outperforms RGB baselines (75% vs 47% success rate) and performs comparably to S2-diffusion (73%), while also highlighting limitations inherited from the underlying foundation models used for keypoint extraction.
Keypoint Imitation Learning leaps ahead of RGB baselines in robotic manipulation, but don't expect it to dethrone diffusion models just yet.
RGB-based imitation learning requires many demonstrations to generalize to unseen objects or scenes, motivating research into intermediate representations to improve generalization for robotic manipulation. Visual foundation models enable one-shot extraction of keypoints to provide such representation. However, it remains unclear how to integrate them into imitation learning optimally and when they outperform alternative representations. We combine approaches from previous works on keypoint imitation learning (KIL) and investigate several design choices to provide practical guidelines. Using over 2000 real-world rollouts, we also assess the generalization capabilities of KIL to unseen objects and scene variations. KIL achieves a 75% overall success rate across five tasks, significantly outperforming the RGB baseline (47%) and performing on par with S2-diffusion (73%). Finally, we explore the limitations of the foundation models used for keypoint extraction and extend KIL to tasks with multiple object instances. Our results confirm KIL as a data-efficient approach for robot learning, though it does not outperform alternative representations and inherits limitations of the foundation models used for keypoint extraction. All rollout videos, demonstrations, and results are available at https://kil-manipulation.github.io/.