CASShenzhen ZHAOWEI Machinery & Electronics Co.Aug 18, 2025

Effective Reinforcement Learning with Smooth Policy Update and Informative State Action Representation

Mohammad Mohammadi, Huilin Wang, Rongdun Lin, Wenjun Huang, Yidong Chen, Yunduan Cui

AI Summary

The paper introduces State-Action Embedding Continuous Dynamic Policy Programming (SAECDPP), a novel Actor-Critic RL method designed to improve sample efficiency and policy stability. SAECDPP employs a multi-scale encoder to generate informative state-action embeddings and uses dynamic relative entropy regularization to balance exploration and exploitation. Experiments on DeepMind Control Suite benchmarks demonstrate that SAECDPP outperforms state-of-the-art RL baselines in learning efficiency and policy robustness for complex robot control tasks.

Key Contribution

Achieve state-of-the-art RL performance on DeepMind Control Suite by dynamically adjusting relative entropy regularization and using multi-scale state-action embeddings.

Abstract

This paper presents State-Action Embedding Continuous Dynamic Policy Programming (SAECDPP), a novel reinforcement learning (RL) approach that enhances sample efficiency and policy stability by integrating dynamic relative entropy regularization and multi-scale state-action embeddings within an Actor-Critic architecture. The proposed methods incorporates a multi-scale encoder that leverages state-action embeddings to extract dynamic features from the target system effectively. We further utilize relative entropy regularization with dynamic temperature adjustment, which not only ensures the learning stability but also maintains an optimal balance between exploration and exploitation during the training process. Extensive experimental evaluation on six DeepMind Control Suite benchmarks demonstrates the superiority of the proposed method compared to state-of-the-art RL baselines in terms of both learning efficiency and policy robustness, substantiating SAECDPP's effectiveness as an efficient and robust approach for learning complex robot control scenarios.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References30

Year2025

Venue2025 8th International Conference on Intelligent Robotics and Control Engineering (IRCE)

Related Papers

Finding related papers...

Search

Effective Reinforcement Learning with Smooth Policy Update and Informative State Action Representation

Related Papers