Notre DameJun 8, 2026arXiv:2606.09559

Safe-RULE: Safe Reinforcement UnLEarning

Shixiong Jiang, Taozheng Zhu, Fanxin Kong

AI Summary

This paper introduces Safe-RULE, a novel learning paradigm that addresses the vulnerabilities of offline safe reinforcement learning (Safe RL) to data poisoning attacks. By employing reinforcement unlearning, Safe-RULE effectively removes the influence of malicious samples without necessitating complete retraining or access to the original training environment. Experimental results across various benchmark Safe RL tasks show significant improvements in safety performance when subjected to data poisoning scenarios, underscoring the method's robustness in safety-critical applications.

Key Contribution

Safe-RULE can effectively neutralize the impact of data poisoning in offline Safe RL, enhancing safety without the need for retraining.

Abstract

Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems such as robotics systems. However, its reliance on static datasets exposes offline Safe RL to data poisoning attacks, where adversaries inject malicious samples that compromise safety and induce unsafe policy behavior. In this work, we propose a new learning paradigm, named safe reinforcement unlearning (Safe-RULE), used as a defense framework to remove the influence of poisoned data without retraining from scratch or requiring access to the original training environment. We further extend reinforcement unlearning to offline Safe RL by explicitly accounting for both task performance and safety constraints during the unlearning process. Experiments across benchmark Safe RL tasks demonstrate that our approach effectively enhances safety performance against data poisoning attacks.

Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Safe-RULE: Safe Reinforcement UnLEarning

Related Papers