Mar 17, 2026arXiv:2603.16470

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou, Yasaman Omid, Sangarapillai Lambotharan, Mahsa Derakhshan, Lajos Hanzo

AI Summary

This paper addresses the challenge of outdated Channel State Information (CSI) in multi-satellite communication systems by proposing a multi-agent reinforcement learning (MARL) algorithm. The algorithm, named Dual Stage Proximal Policy Optimisation (DS-PPO), tackles large continuous action spaces and non-IID environments by using a bi-level optimisation procedure. Results demonstrate DS-PPO's robustness to CSI imperfections and improved sum-rate performance, along with convergence analysis and computational complexity assessment.

Key Contribution

A novel MARL algorithm, DS-PPO, enables multi-satellite systems to maximize user sum-rate despite outdated channel state information, offering a practical solution for robust global connectivity.

Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Related Papers