Search papers, labs, and topics across Lattice.
This paper introduces CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework for off-dynamics offline reinforcement learning, which aims to generate target-domain trajectories from a source-domain diffusion model. CEDGE uses energy guidance, derived from minimizing the distribution mismatch between source and target domains, to adapt generated trajectories. Experiments on the ODRL benchmark show that CEDGE improves diffusion planning and downstream policy learning under dynamics shifts.
Energy-guided diffusion models can bridge the gap between source and target domains in off-dynamics offline RL, enabling effective trajectory generation without retraining the diffusion model.
Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories. While recent model-based methods attempt to address this by learning target-aware dynamics, the generated experience is constructed only at the transition level, which leads to accumulated errors over long horizons. These limitations necessitate a shift toward trajectory-level generation for off-dynamics offline RL. We propose CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework. CEDGE trains a trajectory diffusion model on source-domain trajectories and adapts the generated samples to the target domain through energy guidance. This guidance is derived by minimizing the distribution mismatch between the source and desired target-domain trajectories and is decomposed into return, domain, and behavior energy components. The resulting energy-guided trajectories are useful both for direct planning and as synthetic data for policy learning. Since target adaptation is achieved via energy guidance rather than retraining the diffusion model, CEDGE can be efficiently adapted to new target dynamics compared to previous methods. Experiments on the ODRL benchmark demonstrate that trajectory-level energy-guided generation improves diffusion planning under dynamics shifts and produces synthetic data that improves downstream target policy learning.