Search papers, labs, and topics across Lattice.
Lang2Pose is introduced, a modular ROS 2-based robot control system that uses an LLM to interpret natural language commands for end-effector control and pick-and-place tasks. The system leverages FoundationPose for 6D object pose estimation from RGB-D data and segmentation masks, and uses Lula IK (simulation) and MoveIt 2 (real-world) for motion planning. Validated on a Kinova Gen3 arm with a Robotiq gripper in both Isaac Sim and real-world settings, Lang2Pose demonstrates effective language-guided manipulation, even under partial occlusion.
Control a robot arm with natural language commands, no coding required.
We present Lang2Pose, a modular robot control system that interprets natural language commands to perform either end-effector control or pick-and-place actions. Built upon ROS 2, our framework integrates a large language model (LLM), a perception module leveraging FoundationPose for 6D object pose estimation, and motion planning components using Lula IK for simulation and MoveIt 2 for real-world execution. The system transforms user commands into structured actions, estimates object poses using RGB-D input and segmentation masks, and calculates grasp configurations based on object geometry and orientation. We validate the approach in both simulated and physical environments using a Kinova Gen3 arm and a Robotiq 2F-85 gripper. Simulated trials were conducted in Isaac Sim with high-fidelity physics tuning, while real-world experiments employed fine-tuned YOLO/YOLO-seg models and Realsense RGB-D data. Lang2Pose successfully enables intuitive language-guided manipulation and demonstrates robust performance under partial occlusion. These results highlight the system’s potential as a flexible interface for natural human-robot interaction in both virtual and physical domains.