Search papers, labs, and topics across Lattice.
This paper introduces FORCE, a novel three-stage framework designed to enhance the efficiency of reinforcement learning fine-tuning for Vision-Language-Action (VLA) models by addressing catastrophic unlearning and inefficient policy updates. By implementing a Value-Calibrated Warm-Up phase and utilizing a calibrated Q-function to filter high-value actions, FORCE significantly improves training outcomes. The framework achieves a 79% absolute increase in success rates and outperforms existing RL methods by 10%, all while accelerating training by 32.5% without requiring human intervention.
FORCE achieves a remarkable 79% increase in success rates for VLA models while eliminating the need for costly human interventions during training.
Vision-Language-Action (VLA) models are often constrained by the imitation ceiling imposed by sub-optimal data. While Reinforcement Learning (RL) fine-tuning can surpass this limit, it is notoriously sample inefficient. This challenge arises from two core issues: (1) catastrophic initial unlearning due to an unstable Q-function and (2) inefficient policy updates caused by low-quality exploration data, often forcing a reliance on costly human interventions. We introduce FORCE, a 3-stage framework that stabilizes fine-tuning by tackling both issues. FORCE first incorporates a Value-Calibrated Warm-Up phase, utilizing on-policy rollouts to mitigate the distributional shift of the Q-function. Subsequently, during the online stage, this calibrated Q-function acts as a filter for both the policy's own action proposals and expert data, ensuring only high-value actions are used for the policy update. We evaluate FORCE on various simulation and real-world tasks, and the result shows that FORCE achieves a 79% absolute improvement in success rates and outperform prior RL methods by 10%, while accelerating training by 32.5%. Critically, it mitigates the common success rate drop and achieves this robust performance without human intervention, marking a significant step towards deploying capable and autonomous robotic agents.