Search papers, labs, and topics across Lattice.
This paper introduces a real-world continual learning dataset for vision-language-action (VLA) models, encompassing four sequential manipulation tasks with rigid, contact-rich, and deformable objects. Experiments reveal that VLA models experience substantial catastrophic forgetting when trained on this dataset. The authors then evaluate experience replay, identifying critical implementation details that influence its effectiveness in mitigating forgetting.
Real-world robots forget how to fold towels after learning to pick-and-place, but this work shows experience replay can help, if you do it right.
Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously learned behaviors. While pioneering research has studied the continual learning of VLA models in narrowly simulated environments, this challenge remains largely unexplored under realistic conditions. To address this limitation, we construct a real-world continual learning dataset comprising four sequential manipulation tasks, spanning rigid-object pick-and-place, contact-rich pressing, and deformable-object folding. Using this dataset, we conduct comprehensive experiments and find that VLA models suffer significant catastrophic forgetting when continually learning from heterogeneous real-world demonstrations. We then systematically evaluate experience replay and uncover key implementation factors that govern its success. In summary, this work provides the first empirical study of real-world continual VLA learning and offers practical guidance for deploying long-lived robot policies.