Search papers, labs, and topics across Lattice.
This paper introduces HumanoidArena, a novel benchmark designed to evaluate egocentric hierarchical whole-body learning in humanoid robots. By framing policy learning as a hierarchical decision-making problem, the benchmark assesses how well high-level policies can generate executable whole-body actions that are robust to task distribution shifts and transferable across different general motion trackers (GMTs). Experimental results reveal that while hierarchical control facilitates the execution of complex leg-critical tasks, the performance of learned policies is heavily dependent on the specific GMT used, highlighting challenges in achieving robust cross-GMT transfer.
Hierarchical control in humanoid robots can solve complex leg-critical tasks, but performance is fragile and heavily reliant on the choice of motion tracker.
Humanoid robots promise whole-body interaction in human-centered environments, but scalable policy learning remains difficult because task-level decision-making and whole-body dynamic execution are tightly coupled. A practical solution is hierarchical control, where a high-level policy predicts intermediate whole-body actions and low-level general motion trackers (GMTs) execute them as stable humanoid motion. However, existing benchmarks rarely evaluate the policy-tracker interface itself, leaving open whether intermediate whole-body actions are executable, robust under task distribution shifts, and transferable across different GMT backends. We introduce HumanoidArena, a simulation-first benchmark for egocentric hierarchical whole-body learning. The benchmark formulates policy learning as a hierarchical decision making problem: a high-level policy converts egocentric vision, proprioception, and instructions into a compact whole-body action, which is subsequently executed by a low-level GMT. Instead of treating the legs as planar transport tools, HumanoidArena emphasizes interactions where lower-body coordination is structurally necessary in task completion. We therefore design 7 leg-critical HOI/HSI tasks in which success requires foot placement, balance maintenance, posture adjustment, and whole-body reorientation. To further diagnose the hierarchical system, we evaluate policies from two complementary perspectives: perturbation-conditioned generalization and GMT-conditioned transfer. Experiments show that hierarchical control enables learned policies to solve diverse leg-critical interactions, but performance is strongly tracker-conditioned and cross-GMT transfer remains fragile. These results position HumanoidArena as a benchmark for studying transferable intermediate action representations and scalable egocentric whole-body policy learning.