Search papers, labs, and topics across Lattice.
The paper presents a bottom-up approach for transferring simulated policies to the Boston Dynamics Atlas humanoid robot, enabling it to perform dynamic tasks with unprecedented natural behaviors. They streamlined zero-shot sim-to-real transfer by carefully selecting RL framework components and automating deployment, focusing on mimicking stylized kinematic motions from human motion capture or animation data. The key result is achieving high-quality, human-like motions on the physical robot while minimizing domain randomization and avoiding complex reward shaping or reliance on teleoperation data.
Boston Dynamics' Atlas robot now performs unprecedented natural behaviors by streamlining zero-shot sim-to-real transfer of policies trained to mimic stylized kinematic motions.
This talk will showcase unprecedented natural behaviors in performing dynamic tasks on the Boston Dynamics Atlas humanoid robot, marking a major advance in closing the gap between human characters in graphics and physical humanoids in robotics. In this work, we employed a bottom-up approach to facilitate physical intelligence. Our primary focus was on streamlining the zero-shot sim-to-real transfer of policies that were trained to mimic stylized kinematic motions, either captured from humans or designed by animators. By carefully selecting components of our Reinforcement Learning (RL) framework and automating the deployment process on the hardware, we achieved high-quality motions while minimizing excessive domain randomization and avoiding the need for complicated reward shaping. While this work focuses on embodied physical intelligence, the progress in the broader field of Artificial Intelligence (AI) has largely been driven by advances in perception and cognitive intelligence. The human cognitive imprint on the Internet has fueled the rise of Foundation Models, such as LLMs and VLMs, bringing us closer than ever to the holy grail of AI: creating an agent that understands the world and acts accordingly. Yet, despite these remarkable advances in the virtual realm, the "act" component of AI continues to lag behind. Embodied intelligence has proven challenging, as the physical agent must grapple with the complexity, uncertainty, and variability of the real world. While it is tempting to apply the same methodologies that revolutionized virtual domains, scaling up models and datasets, a crucial difference remains: unlike the rich, structured, and abundant data on the Internet, human motion data is sparse, incomplete, and typically lacks explicit action labels. To address this challenge, some researchers have turned to teleoperation to collect in-morphology motion data with action labels, leveraging pre-trained foundation models to bootstrap the training process. Rather than immediately training a generalist model capable of performing a wide variety of tasks, our first milestone focused on transferring single-task policies trained in simulation to hardware in a zero-shot manner. Furthermore, we used only a motion dataset with state transitions and no action labels, avoiding reliance on teleoperation data. We drew upon data collected from humans through motion capture, videos, or animation. To maintain the motion style during deployment on physical hardware, we minimized excessive domain randomization, as it could compromise the preservation of subtle motion details on the hardware. Building on this foundation, we developed an automated pipeline for processing motion data and performing zero-shot sim-to-real transfer using RL, minimizing the need for human intervention. We then expanded this framework to support multi-task policies that can generalize across various behaviors. To synthesize human-like motions from high-level operator commands, we trained motion generation models using Diffusion Transformers along with motion data that we collected ourselves. The trained motion generation model is used during both training and inference to provide in-context motion references for the RL policy. While this work focuses on bridging the gap at the level of embodied physical intelligence, full cognitive integration remains a broader challenge, and progress will come from converging bottom-up and top-down approaches.