Search papers, labs, and topics across Lattice.
This paper introduces Deep Reinforcement Learning with Reference (DRLR), a framework that enhances imitation bootstrapped reinforcement learning (IBRL) by modifying the action selection module to provide a calibrated Q-value, reducing bootstrapping error. They replace TD3 with SAC to prevent convergence to suboptimal policies. Empirical validation on simulated bucket loading and drawer opening tasks, along with a real-world wheel loader bucket loading task, demonstrates DRLR's robustness across varying state-action dimensions and demonstration qualities, as well as its successful sim-to-real transfer.
Calibrating Q-values in imitation-bootstrapped RL lets robots learn faster and avoid getting stuck in bad solutions.
This paper proposes an exploration-efficient deep reinforcement learning with reference (DRLR) policy framework for learning robotics tasks incorporating demonstrations. The DRLR framework is developed based on an imitation bootstrapped reinforcement learning (IBRL) algorithm. Here, we propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the reinforcement learning (RL) policy from converging to a sub-optimal policy, soft actor–critic (SAC) is used as the RL policy instead of twin delayed DDPG (TD3). The effectiveness of our method in mitigating the bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state–action dimensions and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim-to-real results validate the successful deployment of the DRLR framework.