Search papers, labs, and topics across Lattice.
BORA is introduced as an offline-to-online RL post-training framework for dexterous Vision-Language-Action (VLA) models, addressing challenges in real-world robotic manipulation. The offline phase constructs a critic using VLM cognition tokens and action chunks for action-conditioned value guidance, while the online phase employs a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism. Experiments across five complex real-world dexterous tasks demonstrate BORA's superiority over imitation learning and decoupled RL baselines, achieving significant improvements in success rate and unseen object generalization.
Human-in-the-loop chunk-wise residual adaptation closes the reality gap for dexterous robot manipulation, boosting success rates by up to 43% compared to offline imitation learning.
Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world. To address these challenges, we propose BORA, an offline-to-online RL post-training framework designed for real-world dexterous VLA models. In the offline phase, BORA constructs a critic that takes both the VLM's cognition tokens and action chunks as inputs. This design enables action-conditioned value guidance, allowing the critic to evaluate dexterous hand motions beyond visual context alone. During the subsequent online phase, BORA freezes the VLA base and introduces a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism to mitigate real-world execution errors and further correct the offline-learned intents within the actual physical environment. By inheriting the offline critic and employing intervention-driven rewards, BORA effectively corrects execution discrepancies and adapts to real-world physical variances while preserving the pretrained policy as a stable prior. Extensive evaluations across five complex real-world dexterous tasks demonstrate that BORA significantly outperforms pure imitation learning and traditional decoupled RL baselines, achieving a 33% absolute increase in average success rate under standard settings and up to a 43% improvement in unseen object generalization.