May 28, 2026arXiv:2605.30190

Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents

AI Summary

MF-Diffuser tackles offline multi-agent RL by planning in the Wasserstein space of trajectory distributions, using a value-weighted chaotic entropy objective and hierarchical coarse-to-fine denoising. This approach leverages mean-field theory to represent the full population dynamics with a small subset of agents, mitigating the curse of dimensionality. Theoretical analysis provides suboptimality bounds and Nash equilibrium convergence guarantees, while experiments demonstrate superior performance, especially with suboptimal data and large agent populations (N >= 10^3).

Key Contribution

Scaling offline MARL to thousands of agents is now tractable: MF-Diffuser uses mean-field theory to plan in trajectory distribution space, sidestepping the curse of dimensionality.

Abstract

Diffusion-based planning has achieved strong results in single-agent offline reinforcement learning, yet scaling to many-agent systems remains intractable due to the curse of dimensionality in the joint trajectory space. We introduce MF-Diffuser, a framework that lifts trajectory planning to the Wasserstein space of trajectory distributions, where the propagation of chaos ensures a small representative subset of agents captures the full population dynamics. Our approach features a value-weighted chaotic entropy objective that reconciles generative fidelity with return maximization, and a hierarchical coarse-to-fine strategy that progressively grows the agent population during denoising. We establish end-to-end suboptimality bounds with four interpretable terms, revealing that mean-field approximation error scales as $O(H^2/\sqrt{N})$ while offline distribution shift provably does not grow with population size $N$, and prove the generated policy is an approximate mean-field Nash equilibrium with explicit convergence guarantees. Experiments on three mean-field RL benchmarks -- spanning stage games, sequential dynamics, and adversarial team competition -- show MF-Diffuser achieves the best return in the majority of settings, with the largest gains on suboptimal offline data and at extreme scales ($N \geq 10^3$).

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents

Related Papers