Manuscript received June 19Jun 22, 2026arXiv:2606.23280

Causal Reward World Models: Zero-shot Reward Design for Automated Skill Generation

Yang Yang, Yuchuang Tong, Zhengtao Zhang, Xu Ding, Ning Yang, Yifan Zhang, Haipeng Li, Kehu Yang, Miao Xin

AI Summary

This paper introduces the Causal Reward World Model (CRWM), which enhances Automated Reward Design (ARD) by modeling causal relationships between reward components and task-specific variables, moving beyond the correlation-driven approaches of existing large language models (LLMs). By employing a coarse-to-fine pre-training strategy and a joint optimization module, CRWM enables zero-shot reward function design, drastically reducing the need for iterative feedback while maintaining or exceeding state-of-the-art performance in complex continuous control tasks. The results demonstrate that CRWM not only accelerates the acquisition of new robotic skills but also generalizes effectively across diverse tasks and robotic embodiments.

Key Contribution

Zero-shot reward function design using CRWM cuts down design latency while achieving state-of-the-art performance in robotic skill acquisition.

Abstract

Automated Reward Design (ARD) aims to replace manual reward engineering in reinforcement learning with language-driven reward function synthesis. However, existing approaches based on large language models (LLMs) remain inherently correlation-driven, relying on iterative environmental feedback to refine reward hypotheses for each specific task. This paradigm not only results in inefficient reasoning but also makes LLMs susceptible to semantically plausible yet causally spurious reward components, leading to ineffective optimization. To address these limitations, we propose the Causal Reward World Model (CRWM), which explicitly models the causal topological relationships between candidate reward components and task-targeted physical variables through offline pre-training on multi-task interaction data. Based on a coarse-to-fine pre-training strategy, we introduce a joint optimization module that integrates Explicit Mechanism Decoupling with Confidence-Aware Soft Fusion to refine coarse structural priors using micro-level trajectories, thereby constructing a robust and interpretable causal skeleton. During inference, LLMs leverage CRWM as a task-irrelevant causal prior to constrain the reward generation, enabling zero-shot reward function design. Our work opens up a new white-box paradigm for the ARD problem. Extensive experiments on complex continuous control benchmarks demonstrate that CRWM generates executable reward functions without feedback-driven reward refinement, significantly reducing the design latency for acquiring new robotic skills while matching or surpassing state-of-the-art performance, and further exhibits strong generalization capabilities across unseen tasks and diverse robotic embodiments.

RLHF & Preference Learning World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Causal Reward World Models: Zero-shot Reward Design for Automated Skill Generation

Related Papers