Tsinghua AIBeihangCentral South UniversityCollege of Information and Control EngineeringFirst Aircraft Institute of AviationScience and Technology on ComplexApr 8, 2026arXiv:2604.07171

Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

Yong Si, Mingfei Lu, Yang Hu, Guijiang Li, Yueheng Song, Zhaokui Wang

AI Summary

This paper introduces Smart Commander, a hierarchical reinforcement learning (HRL) framework for optimizing fleet-level maintenance and logistics decisions in military aviation PHM. It decomposes the problem into a strategic "General Commander" managing fleet-level objectives and tactical "Operation Commanders" executing specific actions. Experiments in a high-fidelity simulation show Smart Commander outperforms monolithic DRL and rule-based baselines, achieving faster training, better scalability, and improved robustness in failure-prone environments.

Key Contribution

Hierarchical RL can tame the curse of dimensionality in fleet management, enabling superior maintenance and logistics decisions compared to monolithic approaches.

Abstract

Decision-making in military aviation Prognostics and Health Management (PHM) faces significant challenges due to the "curse of dimensionality" in large-scale fleet operations, combined with sparse feedback and stochastic mission profiles. To address these issues, this paper proposes Smart Commander, a novel Hierarchical Reinforcement Learning (HRL) framework designed to optimize sequential maintenance and logistics decisions. The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation. The proposed approach is validated within a custom-built, high-fidelity discrete-event simulation environment that captures the dynamics of aircraft configuration and support logistics.By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards. Empirical evaluations demonstrate that Smart Commander significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines. Notably, it achieves a substantial reduction in training time while demonstrating superior scalability and robustness in failure-prone environments. These results highlight the potential of HRL as a reliable paradigm for next-generation intelligent fleet management.

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

Related Papers