The 39th Research Institute of ChinaUT-AustinXJTUMar 12, 2026arXiv:2603.11653

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Jiaheng Hu, Jay J. Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martín-Martín

AI Summary

This paper investigates continual reinforcement learning (CRL) for large pretrained Vision-Language-Action (VLA) models, challenging the assumption that Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting. They demonstrate that Seq. FT with low-rank adaptation (LoRA) performs surprisingly well across three models and five lifelong RL benchmarks, often outperforming more complex CRL methods. The authors attribute this robustness to the synergy between large pretrained models, parameter-efficient adaptation, and on-policy RL, which reshapes the stability-plasticity trade-off.

Key Contribution

Forget complex continual learning strategies: simply fine-tuning large vision-language-action models with LoRA is surprisingly effective for lifelong RL.

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Related Papers