CornellIowa StateFeb 19, 2026arXiv:2602.17312

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar

AI Summary

The paper introduces LexiSafe, an offline safe RL framework that uses a lexicographic safety-reward hierarchy to prioritize safety and prevent safety drift in cyber-physical systems. LexiSafe is formulated in both single-cost (LexiSafe-SC) and multi-cost (LexiSafe-MC) settings, each with associated safety-violation and performance-suboptimality bounds, leading to sample complexity guarantees. Experiments demonstrate that LexiSafe achieves fewer safety violations and better task performance than constrained offline RL baselines.

Key Contribution

Guaranteeing safety in offline RL is now more tractable: LexiSafe uses lexicographic prioritization to provably reduce safety violations without sacrificing performance.

Abstract

Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicographic prioritization with structural bias, LexiSafe offers a practical and theoretically grounded approach for safety-critical CPS decision-making.

Constitutional AI & AI Ethics RLHF & Preference Learning Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Related Papers