Search papers, labs, and topics across Lattice.
This paper introduces a rapid deployment pipeline for humanoid robots that significantly reduces the onboarding time for new object manipulation from one to two days to approximately 30 minutes. The pipeline integrates automatic annotation for YOLOv8 object detection, 3D reconstruction using Meta SAM 3D, and zero-shot pose tracking with FoundationPose, enabling efficient grasping and manipulation. The system achieves high detection accuracy (mAP@0.5 = 0.995) and precise pose tracking (蟽 < 1.05 mm), demonstrating its effectiveness in real-world scenarios, including a glue-application task on an automobile window.
Achieving humanoid robot object manipulation in just 30 minutes could revolutionize deployment efficiency in robotics.
Deploying a humanoid robot to manipulate a new object has traditionally required one to two days of effort: data collection, manual annotation, 3D model acquisition, and model training. This paper presents an end-to-end rapid deployment pipeline that integrates three foundation-model components to shorten the onboarding cycle for a new object to approximately 30 minutes: (i) Roboflow-based automatic annotation to assist in training a YOLOv8 object detector; (ii) 3D reconstruction based on Meta SAM 3D, which eliminates the need for a dedicated laser scanner; and (iii) zero-shot 6-DoF pose tracking based on FoundationPose, using the SAM~3D-generated mesh directly as the template. The estimated pose drives a Unity-based inverse kinematics planner, whose joint commands are streamed via UDP to a Unitree~G1 humanoid and executed through the Unitree SDK. We demonstrate detection accuracy of mAP@0.5 = 0.995, pose tracking precision of $蟽< 1.05$ mm, and successful grasping on a real robot at five positions within the workspace. We further verify the generality of the pipeline on an automobile-window glue-application task. The results show that combining foundation models for perception with everyday imaging devices (e.g., smartphones) can substantially lower the deployment barrier for humanoid manipulation tasks.