RAISony AIUT AustinZJUMar 16, 2026arXiv:2603.15956

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

Zifan Xu, Ran Gong, M. Minniti, A. Gundogdu, Eric Rosen, K. Sivakumar, Riedana Yan, Zixing Wang, Di Deng, Peter Stone, Karl Schmeckpeper

AI Summary

ExpertGen is introduced, a framework that leverages reinforcement learning to refine behavior priors learned from imperfect demonstrations (e.g., LLM-synthesized or human) into high-quality expert policies in simulation. This approach optimizes the initial noise of a frozen diffusion policy, regularizing exploration within human-like behavior manifolds and enabling effective learning with sparse rewards. Experiments on manipulation benchmarks show ExpertGen achieves 90.5% success on industrial assembly and 85% on long-horizon tasks, outperforming baselines, with successful sim-to-real transfer via DAgger distillation into visuomotor policies.

Key Contribution

Forget expensive real-world robotics data collection: ExpertGen uses RL to turn noisy, simulated behavior priors (even from LLMs!) into expert policies that transfer to real robots.

Abstract

Learning generalizable and robust behavior cloning policies requires large volumes of high-quality robotics data. While human demonstrations (e.g., through teleoperation) serve as the standard source for expert behaviors, acquiring such data at scale in the real world is prohibitively expensive. This paper introduces ExpertGen, a framework that automates expert policy learning in simulation to enable scalable sim-to-real transfer. ExpertGen first initializes a behavior prior using a diffusion policy trained on imperfect demonstrations, which may be synthesized by large language models or provided by humans. Reinforcement learning is then used to steer this prior toward high task success by optimizing the diffusion model's initial noise while keep original policy frozen. By keeping the pretrained diffusion policy frozen, ExpertGen regularizes exploration to remain within safe, human-like behavior manifolds, while also enabling effective learning with only sparse rewards. Empirical evaluations on challenging manipulation benchmarks demonstrate that ExpertGen reliably produces high-quality expert policies with no reward engineering. On industrial assembly tasks, ExpertGen achieves a 90.5% overall success rate, while on long-horizon manipulation tasks it attains 85% overall success, outperforming all baseline methods. The resulting policies exhibit dexterous control and remain robust across diverse initial configurations and failure states. To validate sim-to-real transfer, the learned state-based expert policies are further distilled into visuomotor policies via DAgger and successfully deployed on real robotic hardware.

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

Related Papers