Search papers, labs, and topics across Lattice.
The paper introduces a stability-aligned residual control architecture for robotic systems to rapidly recover from unobserved dynamics shifts during deployment. This architecture uses a fixed reinforcement learning policy trained under nominal dynamics, augmented by a bounded additive residual channel regulated by a Stability Alignment Gate (SAG). The SAG constrains corrective authority based on magnitude, directional coherence, performance, and adaptive gain, enabling rapid compensation without retraining or privileged information. Experiments across diverse robotic platforms demonstrate significant reductions in recovery time compared to frozen and online-adaptation baselines.
Robots can recover from unexpected mid-episode failures up to 87% faster by adding a small, carefully constrained control signal on top of a pre-trained policy, without any further training.
Robotic systems operating in real-world environments inevitably encounter unobserved dynamics shifts during continuous execution, including changes in actuation, mass distribution, or contact conditions. When such shifts occur mid-episode, even locally stabilizing learned policies can experience substantial transient performance degradation. While input-to-state stability guarantees bounded state deviation, it does not ensure rapid restoration of task-level performance. We address inference-time recovery under frozen policy parameters by casting adaptation as constrained disturbance shaping around a nominal stabilizing controller. We propose a stability-aligned residual control architecture in which a reinforcement learning policy trained under nominal dynamics remains fixed at deployment, and adaptation occurs exclusively through a bounded additive residual channel. A Stability Alignment Gate (SAG) regulates corrective authority through magnitude constraints, directional coherence with the nominal action, performance-conditioned activation, and adaptive gain modulation. These mechanisms preserve the nominal closed-loop structure while enabling rapid compensation for unobserved dynamics shifts without retraining or privileged disturbance information. Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance. Recovery time is reduced by \textbf{87\%} on the Go1 quadruped, \textbf{48\%} on the Cassie biped, \textbf{30\%} on the H1 humanoid, and \textbf{20\%} on the Scout wheeled platform on average across evaluated conditions relative to a frozen SAC policy.