Tencent AITJUUCSDJun 9, 2026arXiv:2606.11324

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Yifu Yuan, Yao-Ting Huang, Xianze Yao, Yutong Li, Shuoheng Zhang, Linqi Han, Pengyi Li, Jiangeng Sun, Wenting Jia, Yuhao Liu, Ruihao Liao, Yucheng Hu, Qiyu Wu, Yuxiao Li, Zibin Dong, Fei Ni, Yan Zheng, Shuyang Gu, Yi Ma, Hongyao Tang, Han Hu, Jianye Hao

AI Summary

The paper introduces Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates various embodied reasoning capabilities, including cognition, task planning, and self-correction, into a single architecture aimed at achieving general physical intelligence. By utilizing three automated data construction pipelines, the authors created a large-scale data system with over 15 billion tokens and employed a multi-task balanced reinforcement learning approach to resolve task conflicts. Embodied-R1.5, with only 8 billion parameters, achieves state-of-the-art performance on 16 out of 24 embodied visual language model benchmarks and demonstrates strong generalization in real-robot experiments across multiple manipulation tasks.

Key Contribution

Achieving state-of-the-art performance with just 8 billion parameters, Embodied-R1.5 redefines the capabilities of embodied models in complex physical tasks.

Abstract

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Related Papers