Amazon ScienceNTUMar 3, 2026arXiv:2603.03198

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Ziyang Gong, Zehang Luo, An-Liu Tang, Anke Tang, Zhe Liu, Shi Fu, Zhi Hou, Ganlin Yang, Weiyun Wang, Xiaofeng Wang, Jianbo Liu, Gen Luo, Haolan Kang, Shuang Luo, Yue Zhou, Yong Luo, Li Shen, Xiaosong Jia, Yao Mu, Chunxiao Liu, Junchi Yan, Hengshuang Zhao, Dacheng Tao, Xiaogang Wang

AI Summary

ACE-Brain-0, a multimodal LLM, is introduced to unify spatial reasoning, autonomous driving, and embodied manipulation by leveraging spatial intelligence as a shared scaffold across diverse embodiments. The proposed Scaffold-Specialize-Reconcile (SSR) paradigm first establishes a shared spatial foundation, cultivates domain-specialized experts, and then harmonizes them through data-free model merging, further enhanced by Group Relative Policy Optimization (GRPO). Experiments show ACE-Brain-0 achieves competitive or state-of-the-art performance across 24 spatial and embodiment-related benchmarks.

Key Contribution

Spatial reasoning could be the secret sauce for building generalist embodied agents that can drive, manipulate objects, and fly drones, all within a single model.

Abstract

Universal embodied intelligence demands robust generalization across heterogeneous embodiments, such as autonomous driving, robotics, and unmanned aerial vehicles (UAVs). However, existing embodied brain in training a unified model over diverse embodiments frequently triggers long-tail data, gradient interference, and catastrophic forgetting, making it notoriously difficult to balance universal generalization with domain-specific proficiency. In this report, we introduce ACE-Brain-0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation within a single multimodal large language model~(MLLM). Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments: although vehicles, robots, and UAVs differ drastically in morphology, they share a common need for modeling 3D mental space, making spatial cognition a natural, domain-agnostic foundation for cross-embodiment transfer. Building on this insight, we propose the Scaffold-Specialize-Reconcile~(SSR) paradigm, which first establishes a shared spatial foundation, then cultivates domain-specialized experts, and finally harmonizes them through data-free model merging. Furthermore, we adopt Group Relative Policy Optimization~(GRPO) to strengthen the model's comprehensive capability. Extensive experiments demonstrate that ACE-Brain-0 achieves competitive and even state-of-the-art performance across 24 spatial and embodiment-related benchmarks.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Related Papers