DukeJHUMay 24, 2026arXiv:2605.24810

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

AI Summary

This paper introduces CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework for off-dynamics offline reinforcement learning, which aims to generate target-domain trajectories from a source-domain diffusion model. CEDGE uses energy guidance, derived from minimizing the distribution mismatch between source and target domains, to adapt generated trajectories. Experiments on the ODRL benchmark show that CEDGE improves diffusion planning and downstream policy learning under dynamics shifts.

Key Contribution

Energy-guided diffusion models can bridge the gap between source and target domains in off-dynamics offline RL, enabling effective trajectory generation without retraining the diffusion model.

Abstract

Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories. While recent model-based methods attempt to address this by learning target-aware dynamics, the generated experience is constructed only at the transition level, which leads to accumulated errors over long horizons. These limitations necessitate a shift toward trajectory-level generation for off-dynamics offline RL. We propose CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework. CEDGE trains a trajectory diffusion model on source-domain trajectories and adapts the generated samples to the target domain through energy guidance. This guidance is derived by minimizing the distribution mismatch between the source and desired target-domain trajectories and is decomposed into return, domain, and behavior energy components. The resulting energy-guided trajectories are useful both for direct planning and as synthetic data for policy learning. Since target adaptation is achieved via energy guidance rather than retraining the diffusion model, CEDGE can be efficiently adapted to new target dynamics compared to previous methods. Experiments on the ODRL benchmark demonstrate that trajectory-level energy-guided generation improves diffusion planning under dynamics shifts and produces synthetic data that improves downstream target policy learning.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

Related Papers