Apr 23, 2026arXiv:2604.21741

Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yanjiang Guo, Jiaming Liu, Shanghang Zhang, Jianyu Chen, Yichen Zhu

AI Summary

This paper introduces Human-in-the-World-Model (Hi-WM), a post-training framework that leverages a learned world model to enable human intervention and correction of robot policies in simulation. Hi-WM allows humans to provide corrective actions within the world model when a policy rollout deviates or fails, caching intermediate states for efficient reuse and dense supervision. Experiments on real-world manipulation tasks demonstrate that Hi-WM significantly improves real-world success rates compared to base policies and closed-loop world model baselines, with strong correlation between world model evaluation and real-world performance.

Key Contribution

Forget expensive real-world robot training: Hi-WM lets humans directly edit a robot's simulated reality, turning world models into powerful, reusable playgrounds for failure recovery.

Abstract

Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a single failure state to be reused for multiple corrective continuations and yielding dense supervision around behaviors that the base policy handles poorly. The resulting corrective trajectories are then added back to the training set for post-training. We evaluate Hi-WM on three real-world manipulation tasks spanning both rigid and deformable object interaction, and on two policy backbones. Hi-WM improves real-world success by 37.9 points on average over the base policy and by 19.0 points over a world-model closed-loop baseline, while world-model evaluation correlates strongly with real-world performance (r = 0.953). These results suggest that world models can serve not only as generators or evaluators, but also as effective corrective substrates for scalable robot post-training.

RLHF & Preference Learning Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Related Papers