Search papers, labs, and topics across Lattice.
This paper introduces Accelerated Constrained Proximal Policy Optimization with a PID-regulated Lagrange multiplier (ACPPO-PID), a safe RL framework for quadrupedal swimming that maximizes thrust while minimizing destabilizing forces. ACPPO-PID enforces constraints using a PID-regulated Lagrange multiplier, accelerates learning with conditional asymmetric clipping, and stabilizes updates via cycle-wise geometric aggregation. Experiments in a towing tank demonstrate that policies learned with ACPPO-PID exhibit improved thrust efficiency, reduced destabilizing forces, and faster convergence compared to baselines, and transfer effectively to free-swimming trials.
Quadrupedal robots can now swim more efficiently and stably thanks to a novel constrained reinforcement learning approach that tames destabilizing forces in complex fluid environments.
Bio-inspired aquatic propulsion offers high thrust and maneuverability but is prone to destabilizing forces such as lift fluctuations, which are further amplified by six-degree-of-freedom (6-DoF) fluid coupling. We formulate quadrupedal swimming as a constrained optimization problem that maximizes forward thrust while minimizing destabilizing fluctuations. Our proposed framework, Accelerated Constrained Proximal Policy Optimization with a PID-regulated Lagrange multiplier (ACPPO-PID), enforces constraints with a PID-regulated Lagrange multiplier, accelerates learning via conditional asymmetric clipping, and stabilizes updates through cycle-wise geometric aggregation. Initialized with imitation learning and refined through on-hardware towing-tank experiments, ACPPO-PID produces control policies that transfer effectively to quadrupedal free-swimming trials. Results demonstrate improved thrust efficiency, reduced destabilizing forces, and faster convergence compared with state-of-the-art baselines, underscoring the importance of constraint-aware safe RL for robust and generalizable bio-inspired locomotion in complex fluid environments.