NVIDIAUT AustinMar 4, 2026arXiv:2603.03818

Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning

Huihan Liu, Changyeon Kim, Bo Liu, Minghuan Liu, Yuke Zhu

AI Summary

This paper investigates continual learning in large-scale pretrained Vision-Language-Action (VLA) models, contrasting their performance with smaller, from-scratch behavior cloning (BC) models. They find VLAs exhibit surprising resistance to catastrophic forgetting compared to BC models, even with simple Experience Replay (ER) and small replay buffer sizes. Analysis reveals that pretraining is crucial, enabling VLAs to maintain forward learning capabilities and retain knowledge from prior tasks, allowing for rapid skill recovery through finetuning.

Key Contribution

Forget everything you thought you knew about continual learning: pretrained Vision-Language-Action models can learn new robotic skills without catastrophic forgetting, even with minimal replay.

Abstract

Continual learning is a long-standing challenge in robot policy learning, where a policy must acquire new skills over time without catastrophically forgetting previously learned ones. While prior work has extensively studied continual learning in relatively small behavior cloning (BC) policy models trained from scratch, its behavior in modern large-scale pretrained Vision-Language-Action (VLA) models remains underexplored. In this work, we found that pretrained VLAs are remarkably resistant to forgetting compared with smaller policy models trained from scratch. Simple Experience Replay (ER) works surprisingly well on VLAs, sometimes achieving zero forgetting even with a small replay data size. Our analysis reveals that pretraining plays a critical role in downstream continual learning performance: large pretrained models mitigate forgetting with a small replay buffer size while maintaining strong forward learning capabilities. Furthermore, we found that VLAs can retain relevant knowledge from prior tasks despite performance degradation during learning new tasks. This knowledge retention enables rapid recovery of seemingly forgotten skills through finetuning. Together, these insights imply that large-scale pretraining fundamentally changes the dynamics of continual learning, enabling models to continually acquire new skills over time with simple replay. Code and more information can be found at https://ut-austin-rpl.github.io/continual-vla

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning

Related Papers