Beijing Innovation Center of HumanoidBJUTPKUSouthwest UState Key Laboratory of SolidificationFeb 18, 2026arXiv:2602.16444

RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Yixue Zhang, Yixue Zhang, Kun Wu, Kun Wu, Zhi Gao, Zhen Zhao, Pei Ren, Pei Ren, Zhiyuan Xu, Fei Liao, Fei Liao, Xinhua Wang, Xinhua Wang, Shichao Fan, Di Wu, Qiuxuan Feng, Meng Li, Zhengping Che, Zhengping Che, Chang Liu, Jian Tang, Jian Tang Beijing Innovation Center of Humanoid Robotics, The School of Advanced Manufacturing, Robotics, Peking University, Beijing University of Technology, The School of Mechanical Engineering, Automation, B. University, State Key Laboratory of Solidification Processing, S. O. Science

AI Summary

The paper introduces RoboGene, an agentic framework for automatically generating diverse and physically plausible robotic manipulation tasks to address the scarcity of real-world robotic interaction data. RoboGene employs diversity-driven sampling, self-reflection mechanisms for physical constraint enforcement, and human-in-the-loop refinement. Experiments demonstrate that VLA models pre-trained with RoboGene-generated data achieve higher success rates and better generalization compared to those trained with data generated by SOTA foundation models like GPT-4o and Gemini 2.5 Pro.

Key Contribution

Forget GPT-4o, the secret to better robot manipulation might be an agentic framework that generates diverse, physically plausible tasks, leading to superior VLA pre-training.

Abstract

The pursuit of general-purpose robotic manipulation is hindered by the scarcity of diverse, real-world interaction data. Unlike data collection from web in vision or language, robotic data collection is an active process incurring prohibitive physical costs. Consequently, automated task curation to maximize data value remains a critical yet under-explored challenge. Existing manual methods are unscalable and biased toward common tasks, while off-the-shelf foundation models often hallucinate physically infeasible instructions. To address this, we introduce RoboGene, an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks across single-arm, dual-arm, and mobile robots. RoboGene integrates three core components: diversity-driven sampling for broad task coverage, self-reflection mechanisms to enforce physical constraints, and human-in-the-loop refinement for continuous improvement. We conduct extensive quantitative analysis and large-scale real-world experiments, collecting datasets of 18k trajectories and introducing novel metrics to assess task quality, feasibility, and diversity. Results demonstrate that RoboGene significantly outperforms state-of-the-art foundation models (e.g., GPT-4o, Gemini 2.5 Pro). Furthermore, real-world experiments show that VLA models pre-trained with RoboGene achieve higher success rates and superior generalization, underscoring the importance of high-quality task generation. Our project is available at https://robogene-boost-vla.github.io.

Data Curation & Synthetic Data Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References60

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Related Papers