Feb 12, 2026arXiv:2602.12062

HoloBrain-0 Technical Report

Xuewu Lin, Tianwei Lin, Hongyu Xie, Jiawei Li, Qingze Wang, Mengdi Li, Ziang Li, Hongzhe Bi, Lichao Huang

AI Summary

The paper introduces HoloBrain-0, a Vision-Language-Action (VLA) framework designed to improve real-world robot deployment by incorporating robot embodiment priors like multi-view camera parameters and URDF into its architecture. They employ a "pre-train then post-train" paradigm, achieving SOTA results on simulation benchmarks and strong performance on real-world manipulation tasks, even with a small 0.2B-parameter variant. The authors open-source the entire HoloBrain ecosystem, including pre-trained models, post-trained checkpoints, and a full-stack VLA infrastructure called RoboOrchard, to facilitate research and adoption.

Key Contribution

A 0.2B-parameter VLA model rivals much larger baselines in robotic manipulation, enabling low-latency on-device deployment.

Abstract

In this work, we introduce HoloBrain-0, a comprehensive Vision-Language-Action (VLA) framework that bridges the gap between foundation model research and reliable real-world robot deployment. The core of our system is a novel VLA architecture that explicitly incorporates robot embodiment priors, including multi-view camera parameters and kinematic descriptions (URDF), to enhance 3D spatial reasoning and support diverse embodiments. We validate this design through a scalable ``pre-train then post-train"paradigm, achieving state-of-the-art results on simulation benchmarks such as RoboTwin 2.0, LIBERO, and GenieSim, as well as strong results on challenging long-horizon real-world manipulation tasks. Notably, our efficient 0.2B-parameter variant rivals significantly larger baselines, enabling low-latency on-device deployment. To further accelerate research and practical adoption, we fully open-source the entire HoloBrain ecosystem, which includes: (1) powerful pre-trained VLA foundations; (2) post-trained checkpoints for multiple simulation suites and real-world tasks; and (3) RoboOrchard, a full-stack VLA infrastructure for data curation, model training and deployment. Together with standardized data collection protocols, this release provides the community with a complete, reproducible path toward high-performance robotic manipulation.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HoloBrain-0 Technical Report

Related Papers