AI LaboratoryHKUShanghai AI LabSJTUUSTCMay 28, 2026arXiv:2605.30226

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Zhongxi Chen, Yifan Han, Yanming Shao, Yanming Shao, Huanming Liu, Congsheng Xu, Xiaoyu Chen, Yao Mu, Yao Mu, Wenzhao Lian, Wenzhao Lian

AI Summary

BORA is introduced as an offline-to-online RL post-training framework for dexterous Vision-Language-Action (VLA) models, addressing challenges in real-world robotic manipulation. The offline phase constructs a critic using VLM cognition tokens and action chunks for action-conditioned value guidance, while the online phase employs a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism. Experiments across five complex real-world dexterous tasks demonstrate BORA's superiority over imitation learning and decoupled RL baselines, achieving significant improvements in success rate and unseen object generalization.

Key Contribution

Human-in-the-loop chunk-wise residual adaptation closes the reality gap for dexterous robot manipulation, boosting success rates by up to 43% compared to offline imitation learning.

Abstract

Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world. To address these challenges, we propose BORA, an offline-to-online RL post-training framework designed for real-world dexterous VLA models. In the offline phase, BORA constructs a critic that takes both the VLM's cognition tokens and action chunks as inputs. This design enables action-conditioned value guidance, allowing the critic to evaluate dexterous hand motions beyond visual context alone. During the subsequent online phase, BORA freezes the VLA base and introduces a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism to mitigate real-world execution errors and further correct the offline-learned intents within the actual physical environment. By inheriting the offline critic and employing intervention-driven rewards, BORA effectively corrects execution discrepancies and adapts to real-world physical variances while preserving the pretrained policy as a stable prior. Extensive evaluations across five complex real-world dexterous tasks demonstrate that BORA significantly outperforms pure imitation learning and traditional decoupled RL baselines, achieving a 33% absolute increase in average success rate under standard settings and up to a 43% improvement in unseen object generalization.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Related Papers