Search papers, labs, and topics across Lattice.
BifrostUMI is introduced as a robot-free data collection framework for humanoid whole-body manipulation, using VR to capture human demonstrations as keypoint trajectories paired with wrist-mounted visual data. A high-level policy network is trained to predict future keypoint trajectories from visual features, which are then mapped to the robot's morphology via a retargeting pipeline and executed by a whole-body controller. Experiments demonstrate the framework's ability to transfer diverse human behaviors to humanoid robots efficiently.
Unlock agile humanoid robots by ditching teleoperation and training directly from human VR demos.
High-quality data collection is a fundamental cornerstone for training humanoid whole-body visuomotor policies. Current data acquisition paradigms predominantly rely on robot teleoperation, which is often hindered by limited hardware accessibility and low operational efficiency. Inspired by the Universal Manipulation Interface (UMI), we propose BifrostUMI, a portable, efficient, and robot-free data collection framework tailored for humanoid robots. BifrostUMI leverages lightweight VR devices to capture human demonstrations as sparse keypoint trajectories while simultaneously recording wrist-mounted visual data. These multimodal data are subsequently utilized to train a high-level policy network that predicts future keypoint trajectories conditioned on the captured visual features. Through a robust keypoint retargeting pipeline, keypoint trajectories are precisely mapped onto the robot's morphology and executed via a whole-body controller. This approach enables the seamless transfer of diverse and agile behaviors from natural human demonstrations to humanoid embodiments. We demonstrate the efficacy and versatility of the proposed framework across two distinct experimental scenarios.