Search papers, labs, and topics across Lattice.
The paper introduces State-Action Embedding Continuous Dynamic Policy Programming (SAECDPP), a novel Actor-Critic RL method designed to improve sample efficiency and policy stability. SAECDPP employs a multi-scale encoder to generate informative state-action embeddings and uses dynamic relative entropy regularization to balance exploration and exploitation. Experiments on DeepMind Control Suite benchmarks demonstrate that SAECDPP outperforms state-of-the-art RL baselines in learning efficiency and policy robustness for complex robot control tasks.
Achieve state-of-the-art RL performance on DeepMind Control Suite by dynamically adjusting relative entropy regularization and using multi-scale state-action embeddings.
This paper presents State-Action Embedding Continuous Dynamic Policy Programming (SAECDPP), a novel reinforcement learning (RL) approach that enhances sample efficiency and policy stability by integrating dynamic relative entropy regularization and multi-scale state-action embeddings within an Actor-Critic architecture. The proposed methods incorporates a multi-scale encoder that leverages state-action embeddings to extract dynamic features from the target system effectively. We further utilize relative entropy regularization with dynamic temperature adjustment, which not only ensures the learning stability but also maintains an optimal balance between exploration and exploitation during the training process. Extensive experimental evaluation on six DeepMind Control Suite benchmarks demonstrates the superiority of the proposed method compared to state-of-the-art RL baselines in terms of both learning efficiency and policy robustness, substantiating SAECDPP's effectiveness as an efficient and robust approach for learning complex robot control scenarios.