Apr 21, 2026arXiv:2604.19102

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

Yuan Wu, Yuanye Wu, Keyi Wang, Linqi Ye, Boyang Xing

AI Summary

This paper introduces a reinforcement learning framework for training humanoid robots to perform five distinct gaits (walking, goose-stepping, running, stair climbing, and jumping) with a single policy. The key innovation is a selective Adversarial Motion Prior (AMP) strategy, applying AMP only to stability-critical gaits to accelerate learning and improve stability. Results show that this selective AMP approach outperforms uniform AMP, achieving faster convergence and better performance across all gaits, and enabling zero-shot sim-to-real transfer to a physical robot.

Key Contribution

Humanoid robots can master diverse gaits like walking, running, and stair climbing with a single policy, but only if you selectively apply motion priors to stabilize specific skills.

Abstract

Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and reward formulation. The key contribution is a selective Adversarial Motion Prior (AMP) strategy: AMP is applied to periodic, stability-critical gaits (walking, goose-stepping, stair climbing) where it accelerates convergence and suppresses erratic behavior, while being deliberately omitted for highly dynamic gaits (running, jumping) where its regularization would over-constrain the motion. Policies are trained via PPO with domain randomization in simulation and deployed on a physical 12-DOF humanoid robot through zero-shot sim-to-real transfer. Quantitative comparisons demonstrate that selective AMP outperforms a uniform AMP policy across all five gaits, achieving faster convergence, lower tracking error, and higher success rates on stability-focused gaits without sacrificing the agility required for dynamic ones.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References10

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

Related Papers