Search papers, labs, and topics across Lattice.
This paper introduces OMG, a novel omni-modal motion generation framework designed for generalist humanoid control, which overcomes the limitations of existing approaches by integrating a scalable reasoning module with a reactive motion tracking system. The authors tackle the dual challenges of acquiring high-quality data and enabling the generator to condition on diverse multi-modal inputs through a robust data curation pipeline and a diffusion-based motion generation backbone. Experimental results demonstrate that OMG achieves state-of-the-art performance and efficient adaptability, paving the way for foundational models in humanoid robotics.
OMG's innovative approach allows humanoid robots to seamlessly integrate and respond to language, audio, and human motions, setting a new benchmark for generalist control.
Humanoid whole-body control has made significant progress in recent years, yet existing approaches remain limited to few-skill policies with heavy reward engineering, or motion trackers that are difficult to extend to new input modalities. We argue that the key to general-purpose humanoid control is to build a scalable brain, a module capable of reasoning with diverse conditioning modalities, atop a reactive motion tracking cerebellum, mirroring the hierarchical structure of biological motor systems. Two challenges arise in realizing this vision: acquiring a vast amount of high-quality data to achieve general purpose control, and equipping the generator with the capability to condition on compositional, extensible multi-modal inputs. We present OMG, which addresses these challenges with a meticulous data curation, filtering and labeling pipeline, as well as a diffusion-based motion generation backbone that conditions on language, audio, and human reference motions. Extensive experiments validate OMG as an omni-modal whole-body controller exhibiting state-of-the-art performance, model scaling behavior and efficient adaptation to new distributions and modalities, marking a concrete step toward foundation models for humanoid robots.