Search papers, labs, and topics across Lattice.
The paper introduces LexiSafe, an offline safe RL framework that uses a lexicographic safety-reward hierarchy to prioritize safety and prevent safety drift in cyber-physical systems. LexiSafe is formulated in both single-cost (LexiSafe-SC) and multi-cost (LexiSafe-MC) settings, each with associated safety-violation and performance-suboptimality bounds, leading to sample complexity guarantees. Experiments demonstrate that LexiSafe achieves fewer safety violations and better task performance than constrained offline RL baselines.
Guaranteeing safety in offline RL is now more tractable: LexiSafe uses lexicographic prioritization to provably reduce safety violations without sacrificing performance.
Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicographic prioritization with structural bias, LexiSafe offers a practical and theoretically grounded approach for safety-critical CPS decision-making.