BeihangDeepCyboTJUZhongguancun AcademyZhongguancun Institute of ArtificialJun 7, 2026arXiv:2606.08495

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

Haoyang Ge, Peng Ren, Yukun Shi, Cong Huang, Kun Li, Kai Chen

AI Summary

This paper introduces EgoPriMo, a novel framework that learns egocentric motion priors for humanoid robots from human demonstrations, enabling the generation and forecasting of full-body motion based on egocentric observations and text prompts. By employing a Triple-stream DiT architecture, EgoPriMo effectively integrates body dynamics, visual context, and textual input, allowing for task-specific adaptations without requiring exhaustive motion specifications. Experimental results demonstrate significant improvements in motion generation capabilities compared to existing methods, showcasing the framework's potential for scalable and interactive humanoid control.

Key Contribution

EgoPriMo enables humanoid robots to generate and forecast complex motions interactively using just egocentric observations and high-level language prompts.

Abstract

Humanoid robots require whole-body motions that adapt to scene context, task requirements, and user intent. Motion tracking reproduces specified trajectories, and humanoid vision-language-action systems provide semantic interfaces, but neither offers a scalable and interactive prior for broad full-body behavior. We introduce EgoPriMo (Egocentric Motion Prior for Humanoid Robots), a unified framework that learns such priors from egocentric human demonstrations. Given egocentric observations and a text prompt, EgoPriMo reconstructs, generates, and forecasts SMPL-based full-body motion. Language is used as a high-level control signal rather than a complete motion specification. At the core of EgoPriMo is a Triple-stream DiT that jointly models body dynamics, egocentric visual context, and text; task-conditioning masks route different tasks and missing-modality data through the same checkpoint. Experiments on Nymeria and EgoExo4D show that one checkpoint improves egocentric motion generation over UniEgoMotion while supporting reconstruction and forecasting; the generated SMPL motions can also be executed by a Unitree humanoid controller. These results indicate a practical path from scalable egocentric observations to generalizable and interactive humanoid motion priors.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

Related Papers