Search papers, labs, and topics across Lattice.
This paper introduces an entropy-guided distributional reinforcement learning method to mitigate overestimation bias and high uncertainty in long-horizon robotic tasks. They dynamically adjust the discount factor in the truncated quantile critics algorithm based on policy entropy, reflecting the agent's learning status. The approach demonstrates an 11% improvement in average evaluation return compared to fixed-discount-factor methods on DeepMind Control Suite and Gymnasium robotics environments, showcasing enhanced sample efficiency and adaptability.
By dynamically tuning the discount factor based on policy entropy, RL agents can learn more stably and efficiently in complex robotic tasks, outperforming traditional fixed-discount approaches by 11%.
This study proposes a novel approach to enhance the stability and performance of reinforcement learning (RL) in long-horizon tasks. Overestimation bias in value function estimation and high uncertainty within environments make it difficult to determine the optimal action. To address this, we improve the truncated quantile critics algorithm by managing uncertainty in robotic applications. Our dynamic method adjusts the discount factor based on policy entropy, allowing for fine-tuning that reflects the agent’s learning status. This enables the existing algorithm to learn stably even in scenarios with limited training data, ensuring more robust adaptation. By leveraging policy entropy loss, this approach effectively boosts confidence in predicting future rewards. Our experiments demonstrated an 11% increase in average evaluation return compared to traditional fixed-discount-factor approaches in the DeepMind Control Suite and Gymnasium robotics environments. This approach significantly enhances sample efficiency and adaptability in complex long-horizon tasks, highlighting the effectiveness of entropy-guided RL in navigating challenging and uncertain environments.