Honda RIMiniMaxOUCWHUMar 1, 2026

Transferring Policy of Offline Reinforcement Learning From Hybrid Dataset to Real World via Progressive Neural Network

Pengyu Zhao, Pengyu Zhao, Zheng Fang, Tongxu Ai, Eric Nichols, Randy Gomez, Bo He, Guangliang Li

AI Summary

This paper addresses the challenge of transferring offline RL policies learned from hybrid (real and simulated) datasets to real-world robotic manipulation tasks. They use Progressive Neural Networks (PNNs) to mitigate policy extrapolation errors and sim-to-real gaps by transferring knowledge from the hybrid offline policy to an online learning agent in the real world. Experiments on two robotic manipulation tasks using a 6-DOF Ned robotic arm demonstrate that PNN-based transfer from hybrid datasets accelerates offline learning and improves adaptation during online learning.

Key Contribution

Learning from a mix of real and simulated data can be effectively transferred to real-world robot tasks using progressive neural networks, enabling safer and more efficient online adaptation.

Abstract

Offline reinforcement learning (Offline RL) provides a compelling solution for applying RL in high-risk or resource-constrained real-world domains such as healthcare, autonomous driving, and robotic manipulation, where online exploration can be unsafe or impractical. However, Offline RL faces critical challenges arising from limited data coverage and potential distributional mismatch between the pre-training dataset and real-world environment. In this paper, we propose to allow an agent to learn from a hybrid dataset: high-quality real-world data and high-diversity simulation data, and assume that the dynamics of the simulation and the real world do not match, but the state space is the same. To address the policy extrapolation error and potentially catastrophic failures because of out-of-distribution actions and sim-to-real gap, we use progressive neural networks (PNNs) to transfer the offline policy to the real world. Results in two robotic manipulation tasks with a six-degree-of-freedom Ned robotic arm show that, the hybrid dataset facilitates faster offline learning and better adaptation to real-world tasks during online learning. In addition, further analysis shows that transferring the offline policy via PNN can not only effectively retain the policy learned from the hybrid dataset and bridge the gap between simulation and reality data, but also allow the agent to explore in a more diverse distribution of samples during online learning.

RLHF & Preference Learning Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueIEEE Robotics and Automation Letters

Related Papers

Finding related papers...

Search

Transferring Policy of Offline Reinforcement Learning From Hybrid Dataset to Real World via Progressive Neural Network

Related Papers