Tsinghua AIGeorgia TechJun 10, 2026arXiv:2606.12604

EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations

Yangcen Liu, Shuo Cheng, Xinchen Yin, Woo Chul Shin, Alfred Cueva, Yiran Yang, Zhenyang Chen, Chuye Zhang, Danfei Xu

AI Summary

This paper introduces EgoEngine, a novel framework that converts egocentric human videos into high-fidelity robot demonstrations, addressing the challenges of visual and action gaps between human and robot manipulation. By leveraging RGB video input, EgoEngine generates both a robot observation video that maintains scene context and a task-aligned action trajectory that adheres to feasibility constraints. Experimental results reveal that EgoEngine successfully facilitates zero-shot visuomotor policy learning from human videos, marking a significant advancement in scalable robot learning without the need for real-robot demonstrations.

Key Contribution

EgoEngine transforms human manipulation videos into actionable robot demonstrations, enabling zero-shot learning without real-world data.

Abstract

Dexterous manipulation is limited by the cost of collecting large-scale robot demonstrations. Egocentric human videos offer a scalable source of diverse manipulation behaviors, but directly using them for robot learning requires bridging two gaps: the visual gap between human and robot observations, and the action gap between human motion and robot-executable action. We propose EgoEngine, a scalable framework for transforming egocentric human manipulation videos into high-fidelity robot data. Given an egocentric RGB video, EgoEngine produces: (i) a high-fidelity robot observation video replacing human with robot while preserving scene context and temporal alignment, and (ii) a task-aligned, executable robot action trajectory under feasibility constraints. Experiments in simulation and on real robots show that EgoEngine enables scalable conversion of human videos into robot data and, to our knowledge, demonstrates the first zero-shot visuomotor dexterous policy learning from egocentric human videos without real-robot demonstrations. Project website: https://egoengine.github.io.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations

Related Papers