Mar 17, 2026arXiv:2603.17092

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

E. Daneshmand, Shafeef Omar, Glen Berseth, Majid Khadiv, Hsiu-Chin Lin

AI Summary

SLowRL addresses the challenge of safely and efficiently fine-tuning RL policies for robot locomotion on hardware by combining Low-Rank Adaptation (LoRA) with a recovery policy for training-time safety enforcement. They fine-tune policies learned in simulation directly on a Unitree Go2 quadruped robot, enforcing safety constraints during the fine-tuning process. Results demonstrate a 46.5% reduction in fine-tuning time and near-zero safety violations compared to PPO, with rank-1 adaptation proving sufficient for performance recovery.

Key Contribution

Rank-1 LoRA fine-tuning can safely and efficiently adapt simulated locomotion policies to real-world robots, slashing fine-tuning time by nearly half while maintaining safety.

Abstract

Sim-to-real transfer of locomotion policies often leads to performance degradation due to the inevitable sim-to-real gap. Naively fine-tuning these policies directly on hardware is problematic, as it poses risks of mechanical failure and suffers from high sample inefficiency. In this paper, we address the challenge of safely and efficiently fine-tuning reinforcement learning (RL) policies for dynamic locomotion tasks. Specifically, we focus on fine-tuning policies learned in simulation directly on hardware, while explicitly enforcing safety constraints. In doing so, we introduce SLowRL, a framework that combines Low-Rank Adaptation (LoRA) with training-time safety enforcement via a recovery policy. We evaluate our method both in simulation and on a real Unitree Go2 quadruped robot for jump and trot tasks. Experimental results show that our method achieves a $46.5\%$ reduction in fine-tuning time and near-zero safety violations compared to standard proximal policy optimization (PPO) baselines. Notably, we find that a rank-1 adaptation alone is sufficient to recover pre-trained performance in the real world, while maintaining stable and safe real-world fine-tuning. These results demonstrate the practicality of safe, efficient fine-tuning for dynamic real-world robotic applications.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

Related Papers