PolyUTonyJan 18, 2024

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang, Xingfang Wang, Yan Zhao, Jiachen Hu, Hui Li, Zhihong Jiang

AI Summary

The paper introduces Guided Model-Based Policy Search (GMBPS), a novel reinforcement learning algorithm that combines model-free (MF) and model-based (MB) approaches to improve the learning efficiency and adaptability of physical robots. GMBPS incorporates the global MF value function into the MB objective function, guiding MB policy search with the MF policy to overcome MB suboptimality and accelerate learning. Experimental results on a 6-DOF UR5e robot arm demonstrate that GMBPS enables rapid learning of reaching tasks with improved policies and higher learning efficiency compared to traditional methods.

Key Contribution

Robots can now learn complex motor skills in minutes instead of hours, thanks to a new hybrid reinforcement learning algorithm that fuses model-free and model-based approaches.

Abstract

Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. Note to Practitioners—Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations7

Influential citations0

References0

Year2025

VenueIEEE Transactions on Automation Science and Engineering

Related Papers

Finding related papers...