AI LabCASLi AutoMar 4, 2026arXiv:2603.04073

Swimming Under Constraints: A Safe Reinforcement Learning Framework for Quadrupedal Bio-Inspired Propulsion

Xinyu Cui, Fei Han, Hang Xu, Yongcheng Zeng, Luoyang Sun, Ruizhi Zhang, Jian Zhao, Haifeng Zhang, Weikun Li, Jun Wang, Dixia Fan

AI Summary

This paper introduces Accelerated Constrained Proximal Policy Optimization with a PID-regulated Lagrange multiplier (ACPPO-PID), a safe RL framework for quadrupedal swimming that maximizes thrust while minimizing destabilizing forces. ACPPO-PID enforces constraints using a PID-regulated Lagrange multiplier, accelerates learning with conditional asymmetric clipping, and stabilizes updates via cycle-wise geometric aggregation. Experiments in a towing tank demonstrate that policies learned with ACPPO-PID exhibit improved thrust efficiency, reduced destabilizing forces, and faster convergence compared to baselines, and transfer effectively to free-swimming trials.

Key Contribution

Quadrupedal robots can now swim more efficiently and stably thanks to a novel constrained reinforcement learning approach that tames destabilizing forces in complex fluid environments.

Abstract

Bio-inspired aquatic propulsion offers high thrust and maneuverability but is prone to destabilizing forces such as lift fluctuations, which are further amplified by six-degree-of-freedom (6-DoF) fluid coupling. We formulate quadrupedal swimming as a constrained optimization problem that maximizes forward thrust while minimizing destabilizing fluctuations. Our proposed framework, Accelerated Constrained Proximal Policy Optimization with a PID-regulated Lagrange multiplier (ACPPO-PID), enforces constraints with a PID-regulated Lagrange multiplier, accelerates learning via conditional asymmetric clipping, and stabilizes updates through cycle-wise geometric aggregation. Initialized with imitation learning and refined through on-hardware towing-tank experiments, ACPPO-PID produces control policies that transfer effectively to quadrupedal free-swimming trials. Results demonstrate improved thrust efficiency, reduced destabilizing forces, and faster convergence compared with state-of-the-art baselines, underscoring the importance of constraint-aware safe RL for robust and generalizable bio-inspired locomotion in complex fluid environments.

Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Swimming Under Constraints: A Safe Reinforcement Learning Framework for Quadrupedal Bio-Inspired Propulsion

Related Papers