Search papers, labs, and topics across Lattice.
This paper introduces a markerless stereo 6D pose estimation and position-based visual servoing framework for continuum manipulators in endoscopic surgical systems, addressing challenges related to hysteresis, compliance, and limited distal sensing. They employ a stereo-aware multi-feature fusion network trained on photo-realistic simulated data, refined with a feed-forward rendering-based module and self-supervised sim-to-real adaptation. Real-world experiments demonstrate high accuracy in pose estimation (0.83 mm translation error, 2.76° rotation error) and closed-loop visual servoing, achieving significant error reductions compared to open-loop control.
Achieve precise closed-loop control of endoscopic continuum manipulators without markers or embedded sensors using a novel markerless 6D pose estimation and visual servoing framework.
Continuum manipulators in flexible endoscopic surgical systems offer high dexterity for minimally invasive procedures; however, accurate pose estimation and closed-loop control remain challenging due to hysteresis, compliance, and limited distal sensing. Vision-based approaches reduce hardware complexity but are often constrained by limited geometric observability and high computational overhead, restricting real-time closed-loop applicability. This paper presents a unified framework for markerless stereo 6D pose estimation and position-based visual servoing of continuum manipulators. A photo-realistic simulation pipeline enables large-scale automatic training with pixel-accurate annotations. A stereo-aware multi-feature fusion network jointly exploits segmentation masks, keypoints, heatmaps, and bounding boxes to enhance geometric observability. To enforce geometric consistency without iterative optimization, a feed-forward rendering-based refinement module predicts residual pose corrections in a single pass. A self-supervised sim-to-real adaptation strategy further improves real-world performance using unlabeled data. Extensive real-world validation achieves a mean translation error of 0.83 mm and a mean rotation error of 2.76{\deg} across 1,000 samples. Markerless closed-loop visual servoing driven by the estimated pose attains accurate trajectory tracking with a mean translation error of 2.07 mm and a mean rotation error of 7.41{\deg}, corresponding to 85% and 59% reductions compared to open-loop control, together with high repeatability in repeated point-reaching tasks. To the best of our knowledge, this work presents the first fully markerless pose-estimation-driven position-based visual servoing framework for continuum manipulators, enabling precise closed-loop control without physical markers or embedded sensing.