Search papers, labs, and topics across Lattice.
This paper introduces a distributed quantum reinforcement learning (QRL) framework, MADQRL, designed to address the computational challenges of high-dimensional multi-agent environments in RL. MADQRL distributes the learning process across multiple agents, enabling independent learning and reducing the computational burden of joint training. Experiments on the cooperative-pong environment demonstrate that MADQRL achieves a ~10% improvement over other distributed strategies and a ~5% improvement over classical policy representation models.
Quantum reinforcement learning gets a distributed boost, achieving 10% better performance in multi-agent environments by distributing the learning load across multiple quantum agents.
Reinforcement learning (RL) is one of the most practical ways to learn from real-life use-cases. Motivated from the cognitive methods used by humans makes it a widely acceptable strategy in the field of artificial intelligence. Most of the environments used for RL are often high-dimensional, and traditional RL algorithms becomes computationally expensive and challenging to effectively learn from such systems. Recent advancements in practical demonstration of quantum computing (QC) theories, such as compact encoding, enhanced representation and learning algorithms, random sampling, or the inherent stochastic nature of quantum systems, have opened up new directions to tackle these challenges. Quantum reinforcement learning (QRL) is seeking significant traction over the past few years. However, the current state of quantum hardware is not enough to cater for such high-dimensional environments with complex multi-agent setup. To tackle this issue, we propose a distributed framework for QRL where multiple agents learn independently, distributing the load of joint training from individual machines. Our method works well for environments with disjoint sets of action and observation spaces, but can also be extended to other systems with reasonable approximations. We analyze the proposed method on cooperative-pong environment and our results indicate ~10% improvement from other distribution strategies, and ~5% improvement from classical models of policy representation.