Mar 9, 2026arXiv:2603.08619

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

Nehar Poddar, Stephen McCrory, Luigi Penco, Geoffrey Clark, Hakki Erhan Svil, Robert J. Griffin, Robert Griffin

AI Summary

This paper introduces a reinforcement learning approach for humanoid robot recovery that explicitly incorporates classical balance metrics (capture point, center-of-mass state, and centroidal momentum) into the critic network and reward function. By using these metrics as privileged information during training, the RL agent learns a unified policy capable of diverse recovery strategies, from ankle adjustments to compliant falling and stand-up maneuvers. The resulting policy achieves a 93.4% recovery rate on the Unitree H1-2 in simulation and demonstrates sim-to-sim transfer, highlighting the importance of balance-informed structure for robust recovery behaviors.

Key Contribution

Humanoid robots can now recover from falls with 93% success by baking in classical balance principles into RL, enabling diverse strategies from ankle adjustments to compliant falling.

Abstract

Humanoid robots remain vulnerable to falls and unrecoverable failure states, limiting their practical utility in unstructured environments. While reinforcement learning has demonstrated stand-up behaviors, existing approaches treat recovery as a pure task-reward problem without an explicit representation of the balance state. We present a unified RL policy that addresses this limitation by embedding classical balance metrics: capture point, center-of-mass state, and centroidal momentum, as privileged critic inputs and shaping rewards directly around these quantities during training, while the actor relies solely on proprioception for zero-shot hardware transfer. Without reference trajectories or scripted contacts, a single policy spans the full recovery spectrum: ankle and hip strategies for small disturbances, corrective stepping under large pushes, and compliant falling with multi-contact stand-up using the hands, elbows, and knees. Trained on the Unitree H1-2 in Isaac Lab, the policy achieves a 93.4% recovery rate across randomized initial poses and unscripted fall configurations. An ablation study shows that removing the balance-informed structure causes stand-up learning to fail entirely, confirming that these metrics provide a meaningful learning signal rather than incidental structure. Sim-to-sim transfer to MuJoCo and preliminary hardware experiments further demonstrate cross-environment generalization. These results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

Related Papers