Search papers, labs, and topics across Lattice.
This paper introduces Safe-RULE, a novel learning paradigm that addresses the vulnerabilities of offline safe reinforcement learning (Safe RL) to data poisoning attacks. By employing reinforcement unlearning, Safe-RULE effectively removes the influence of malicious samples without necessitating complete retraining or access to the original training environment. Experimental results across various benchmark Safe RL tasks show significant improvements in safety performance when subjected to data poisoning scenarios, underscoring the method's robustness in safety-critical applications.
Safe-RULE can effectively neutralize the impact of data poisoning in offline Safe RL, enhancing safety without the need for retraining.
Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems such as robotics systems. However, its reliance on static datasets exposes offline Safe RL to data poisoning attacks, where adversaries inject malicious samples that compromise safety and induce unsafe policy behavior. In this work, we propose a new learning paradigm, named safe reinforcement unlearning (Safe-RULE), used as a defense framework to remove the influence of poisoned data without retraining from scratch or requiring access to the original training environment. We further extend reinforcement unlearning to offline Safe RL by explicitly accounting for both task performance and safety constraints during the unlearning process. Experiments across benchmark Safe RL tasks demonstrate that our approach effectively enhances safety performance against data poisoning attacks.