Search papers, labs, and topics across Lattice.
This survey comprehensively reviews resilience in Cyber-Physical Systems (CPS), framing the field through five interconnected themes: system-wide properties, learning-enabled CPS challenges, proactive measures, recovery mechanisms, and the central role of the human. It addresses the unique challenges of data scarcity and noisy data in learning-enabled CPS, advocating for synthetic data generation and foundation model adaptation. The survey emphasizes proactive resilience measures, "just good enough" recovery strategies, and the necessity of trust calibration and explainable AI in human-CPS teaming, illustrated through applications in Connected and Autonomous Transportation Systems (CATS) and Medical CPS (MCPS).
Real-world cyber-physical system resilience demands a holistic approach integrating hardware, software, human factors, proactive measures, and adaptive recovery, moving beyond traditional fault models.
Resilience in cyber-physical systems (CPS) is the fundamental ability to maintain safety and critical functionality despite adverse"perturbations,"which includes security attacks, environmental disruptions, and hardware or software failures. This survey provides a comprehensive review of CPS resilience, framing the field through five interconnected themes that are required in an integrated whole to achieve real-world resilience. The article first posits that resilience is a system-wide property emerging from interactions between hardware, software, and human users. Second, it addresses the challenges of learning-enabled CPS, which often operate in data-scarce environments characterized by imbalanced or noisy data, requiring innovative solutions like synthetic data generation and foundation model adaptation. Third, the survey examines proactive measures for resilience, which include distinctive aspects of verification, testing, and redundancy. Fourth, it explores recovery mechanisms, moving beyond traditional fault models to design"just good enough"recovery strategies that prioritize safety-critical functions during perturbations. Finally, it highlights the central role of the human, focusing on the different levels of human intervention, the necessity of trust calibration, and the requirement for explainable AI to support human-CPS teaming. These themes are illustrated through representative application domains, primarily Connected and Autonomous Transportation Systems (CATS) and Medical CPS (MCPS). By integrating the five interconnected themes, this survey provides a systematic roadmap for achieving the resilient CPS in increasingly complex and adversarial environments.