Sejong UniversityAug 27, 2025

Lang2Pose: Language-Based Visual Servoing and Pose Control for Autonomous Pick-and-Place in Real and Simulated Environments

Hyeonsu Oh, Haryun Jeong, Seonghan Shin, Jae-Hoon Kim

AI Summary

Lang2Pose is introduced, a modular ROS 2-based robot control system that uses an LLM to interpret natural language commands for end-effector control and pick-and-place tasks. The system leverages FoundationPose for 6D object pose estimation from RGB-D data and segmentation masks, and uses Lula IK (simulation) and MoveIt 2 (real-world) for motion planning. Validated on a Kinova Gen3 arm with a Robotiq gripper in both Isaac Sim and real-world settings, Lang2Pose demonstrates effective language-guided manipulation, even under partial occlusion.

Key Contribution

Control a robot arm with natural language commands, no coding required.

Abstract

We present Lang2Pose, a modular robot control system that interprets natural language commands to perform either end-effector control or pick-and-place actions. Built upon ROS 2, our framework integrates a large language model (LLM), a perception module leveraging FoundationPose for 6D object pose estimation, and motion planning components using Lula IK for simulation and MoveIt 2 for real-world execution. The system transforms user commands into structured actions, estimates object poses using RGB-D input and segmentation masks, and calculates grasp configurations based on object geometry and orientation. We validate the approach in both simulated and physical environments using a Kinova Gen3 arm and a Robotiq 2F-85 gripper. Simulated trials were conducted in Isaac Sim with high-fidelity physics tuning, while real-world experiments employed fine-tuned YOLO/YOLO-seg models and Realsense RGB-D data. Lang2Pose successfully enables intuitive language-guided manipulation and demonstrates robust performance under partial occlusion. These results highlight the system’s potential as a flexible interface for natural human-robot interaction in both virtual and physical domains.

Computer Vision Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References16

Year2025

Venue2025 International Conference on Metaverse Computing, Networking and Applications (MetaCom)

Related Papers

Finding related papers...

Search

Lang2Pose: Language-Based Visual Servoing and Pose Control for Autonomous Pick-and-Place in Real and Simulated Environments

Related Papers