BJUTHITHKUSTShenzhen MSU-BIT UniversityJun 16, 2026arXiv:2606.17833

HumanoidArena: Benchmarking Egocentric Hierarchical Whole-body Learning

Taowen Wang, Zikang Xie, Bin Yang, Yunheng Wang, Zizhao Yuan, Yuetong Fang, Yixiao Feng, Yichi Wang, Xingyu Chen, Haodong Chen, Qiwei Wu, Weisheng Xu, Lihan Chen, Lusong Li, Zecui Zeng, Renjing Xu

AI Summary

This paper introduces HumanoidArena, a novel benchmark designed to evaluate egocentric hierarchical whole-body learning in humanoid robots. By framing policy learning as a hierarchical decision-making problem, the benchmark assesses how well high-level policies can generate executable whole-body actions that are robust to task distribution shifts and transferable across different general motion trackers (GMTs). Experimental results reveal that while hierarchical control facilitates the execution of complex leg-critical tasks, the performance of learned policies is heavily dependent on the specific GMT used, highlighting challenges in achieving robust cross-GMT transfer.

Key Contribution

Hierarchical control in humanoid robots can solve complex leg-critical tasks, but performance is fragile and heavily reliant on the choice of motion tracker.

Abstract

Humanoid robots promise whole-body interaction in human-centered environments, but scalable policy learning remains difficult because task-level decision-making and whole-body dynamic execution are tightly coupled. A practical solution is hierarchical control, where a high-level policy predicts intermediate whole-body actions and low-level general motion trackers (GMTs) execute them as stable humanoid motion. However, existing benchmarks rarely evaluate the policy-tracker interface itself, leaving open whether intermediate whole-body actions are executable, robust under task distribution shifts, and transferable across different GMT backends. We introduce HumanoidArena, a simulation-first benchmark for egocentric hierarchical whole-body learning. The benchmark formulates policy learning as a hierarchical decision making problem: a high-level policy converts egocentric vision, proprioception, and instructions into a compact whole-body action, which is subsequently executed by a low-level GMT. Instead of treating the legs as planar transport tools, HumanoidArena emphasizes interactions where lower-body coordination is structurally necessary in task completion. We therefore design 7 leg-critical HOI/HSI tasks in which success requires foot placement, balance maintenance, posture adjustment, and whole-body reorientation. To further diagnose the hierarchical system, we evaluate policies from two complementary perspectives: perturbation-conditioned generalization and GMT-conditioned transfer. Experiments show that hierarchical control enables learned policies to solve diverse leg-critical interactions, but performance is strongly tracker-conditioned and cross-GMT transfer remains fragile. These results position HumanoidArena as a benchmark for studying transferable intermediate action representations and scalable egocentric whole-body policy learning.

Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HumanoidArena: Benchmarking Egocentric Hierarchical Whole-body Learning

Related Papers