Feb 19, 2026arXiv:2602.17062

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

AI Summary

The paper introduces Successive Sub-value Q-learning (S2Q) to address the issue of MARL algorithms converging to suboptimal policies due to reliance on a single optimal action and difficulty adapting to shifting value functions. S2Q learns multiple sub-value functions to retain alternative high-value actions, facilitating persistent exploration and quicker adaptation to changing optima. Experiments on challenging MARL benchmarks demonstrate that S2Q outperforms existing MARL algorithms in adaptability and overall performance.

Key Contribution

MARL agents can learn to adapt to shifting optima by remembering and exploring multiple high-value actions, rather than converging to a single, potentially brittle, optimal policy.

Abstract

Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), which learns multiple sub-value functions to retain alternative high-value actions. Incorporating these sub-value functions into a Softmax-based behavior policy, S2Q encourages persistent exploration and enables $Q^{\text{tot}}$ to adjust quickly to the changing optima. Experiments on challenging MARL benchmarks confirm that S2Q consistently outperforms various MARL algorithms, demonstrating improved adaptability and overall performance. Our code is available at https://github.com/hyeon1996/S2Q.

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Related Papers