Search papers, labs, and topics across Lattice.
This paper analyzes independent policy-gradient (PG) learning in N-player linear-quadratic (LQ) stochastic differential games, where each player's policy depends only on its own state. The authors establish global linear convergence of independent PG methods to an equilibrium by demonstrating that the LQ game admits an α-potential structure, with α quantifying interaction asymmetry. They show that for symmetric interactions, independent PG converges to an affine distributed equilibrium, while for asymmetric interactions, projected PG converges to an approximate equilibrium with suboptimality proportional to the asymmetry.
Independent learners in multi-agent games can provably converge to equilibria, even with asymmetric interactions, thanks to a novel α-potential structure.
We analyze independent policy-gradient (PG) learning in $N$-player linear-quadratic (LQ) stochastic differential games. Each player employs a distributed policy that depends only on its own state and updates the policy independently using the gradient of its own objective. We establish global linear convergence of these methods to an equilibrium by showing that the LQ game admits an $α$-potential structure, with $α$ determined by the degree of pairwise interaction asymmetry. For pairwise-symmetric interactions, we construct an affine distributed equilibrium by minimizing the potential function and show that independent PG methods converge globally to this equilibrium, with complexity scaling linearly in the population size and logarithmically in the desired accuracy. For asymmetric interactions, we prove that independent projected PG algorithms converge linearly to an approximate equilibrium, with suboptimality proportional to the degree of asymmetry. Numerical experiments confirm the theoretical results across both symmetric and asymmetric interaction networks.