Search papers, labs, and topics across Lattice.
This paper introduces a reinforcement learning approach for humanoid robot recovery that explicitly incorporates classical balance metrics (capture point, center-of-mass state, and centroidal momentum) into the critic network and reward function. By using these metrics as privileged information during training, the RL agent learns a unified policy capable of diverse recovery strategies, from ankle adjustments to compliant falling and stand-up maneuvers. The resulting policy achieves a 93.4% recovery rate on the Unitree H1-2 in simulation and demonstrates sim-to-sim transfer, highlighting the importance of balance-informed structure for robust recovery behaviors.
Humanoid robots can now recover from falls with 93% success by baking in classical balance principles into RL, enabling diverse strategies from ankle adjustments to compliant falling.
Humanoid robots remain vulnerable to falls and unrecoverable failure states, limiting their practical utility in unstructured environments. While reinforcement learning has demonstrated stand-up behaviors, existing approaches treat recovery as a pure task-reward problem without an explicit representation of the balance state. We present a unified RL policy that addresses this limitation by embedding classical balance metrics: capture point, center-of-mass state, and centroidal momentum, as privileged critic inputs and shaping rewards directly around these quantities during training, while the actor relies solely on proprioception for zero-shot hardware transfer. Without reference trajectories or scripted contacts, a single policy spans the full recovery spectrum: ankle and hip strategies for small disturbances, corrective stepping under large pushes, and compliant falling with multi-contact stand-up using the hands, elbows, and knees. Trained on the Unitree H1-2 in Isaac Lab, the policy achieves a 93.4% recovery rate across randomized initial poses and unscripted fall configurations. An ablation study shows that removing the balance-informed structure causes stand-up learning to fail entirely, confirming that these metrics provide a meaningful learning signal rather than incidental structure. Sim-to-sim transfer to MuJoCo and preliminary hardware experiments further demonstrate cross-environment generalization. These results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.