Mar 17, 2026arXiv:2603.16141

Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment

Enguang Fan, Yifan Chen, Zihan Shan, Matthew Caesar, Jae Kim

AI Summary

This paper introduces a graph-based multi-agent reinforcement learning (MARL) framework for decentralized cooperative UAV deployment, trained with centralized training and decentralized execution (CTDE). The architecture uses an agent-entity attention module to encode local state and nearby entities, and aggregates inter-UAV messages with neighbor self-attention over a distance-limited communication graph. Evaluated on cooperative relay (DroneConnect) and adversarial engagement (DroneCombat) tasks, the method achieves high coverage under restricted communication and generalizes to unseen team sizes.

Key Contribution

UAV swarms can achieve near-optimal cooperative deployment and generalize to new team sizes using a communication-aware MARL approach, even with limited communication and partial observability.

Abstract

Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and intermittent peer-to-peer links. We present a graph-based multi-agent reinforcement learning framework trained under centralized training with decentralized execution (CTDE): a centralized critic and global state are available only during training, while each UAV executes a shared policy using local observations and messages from nearby neighbors. Our architecture encodes local agent state and nearby entities with an agent-entity attention module, and aggregates inter-UAV messages with neighbor self-attention over a distance-limited communication graph. We evaluate primarily on a cooperative relay deployment task (DroneConnect) and secondarily on an adversarial engagement task (DroneCombat). In DroneConnect, the proposed method achieves high coverage under restricted communication and partial observation (e.g. 74% coverage with M = 5 UAVs and N = 10 nodes) while remaining competitive with a mixed-integer linear programming (MILP) optimization-based offline upper bound, and it generalizes to unseen team sizes without fine-tuning. In the adversarial setting, the same framework transfers without architectural changes and improves win rate over non-communicating baselines.

Distributed Systems & Hardware Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment

Related Papers