RimbotSimple AITongjiTs- inghua UniversityJun 1, 2026arXiv:2606.02280

Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy Adaptation

Zhiming Xu, Weitao Zhou, Xianghui Pan, Nanshan Deng, Chengju Liu, Qijun Chen, Chenpeng Yao

AI Summary

This paper addresses the challenge of reinforcement learning in robotics under real-world dynamics shifts by proposing a novel outcome-centric approach to dynamics adaptation. Instead of relying on pre-specified physical parameters, the method utilizes contrastive learning to enable policies to autonomously learn the impact of dynamics on interaction outcomes, resulting in a robust latent topology. Experimental results on MuJoCo benchmarks demonstrate that this approach outperforms traditional parameter-centric methods, particularly in scenarios with severe dynamics changes, while enhancing stability and interpretability of the learned representations.

Key Contribution

Learning dynamics through outcomes rather than parameters leads to significantly more robust policy adaptation in the face of real-world changes.

Abstract

Real-world dynamics shifts pose a critical challenge for reinforcement learning in robotics, as policies tightly coupled to nominal environments often fail catastrophically when physical conditions change. Most existing methods rely on encoding explicitly identified physical parameters into a latent context, a parameter-centric paradigm that depends on pre-specified axes of variation and becomes brittle under unmodeled or compound dynamics changes. We revisit dynamics adaptation from an outcome-centric perspective: rather than telling policies what the dynamics are, we enable them to learn how dynamics affect interaction outcomes. Theoretically, this is grounded in a monotonic relationship between target-domain regret and the Lipschitz constant of a trajectory dynamics encoder. Practically, this constant can be upper-bounded through contrastive learning, yielding a smooth, task-relevant latent topology without privileged dynamics information. On MuJoCo benchmarks, our method consistently outperforms parameter-centric baselines under severe dynamics shifts, including unmodeled and time-varying parameters, while also improving in-distribution stability and latent interpretability. Overall, these results validate that controlling latent geometry is a principled mechanism for robust adaptation.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy Adaptation

Related Papers