Apr 30, 2026arXiv:2604.27378

Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms

Zhenjie Ren, Xiaoli Wei, Xiang Yu, Xun Yu Zhou

AI Summary

This paper develops Q-learning algorithms for mean-field control (MFC) problems with controlled common noise, building upon a relaxed control formulation and martingale conditions for value and Iq-functions. They address the unobservability of relaxed control data by quantifying the error from using observable exploratory data with discretely sampled actions. The proposed Actor-Critic Q-learning algorithm updates the policy based on an improved Iq-function and updates the value and Iq-function using martingale orthogonality, demonstrating convergence in an infinite-horizon LQ framework and satisfactory performance in LQ and non-LQ examples.

Key Contribution

Q-learning can now tackle mean-field control problems with common noise, even when the ideal data is unobservable, opening the door to more realistic and complex multi-agent control scenarios.

Abstract

This paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.

Training Efficiency & Optimization

Citation Metrics

Citations1

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms

Related Papers