Mar 9, 2026arXiv:2603.08324

EndoSERV: A Vision-based Endoluminal Robot Navigation System

Junyang Wu, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Yun Gu, Guang-Zhong Yang

AI Summary

The paper introduces EndoSERV, a novel vision-based localization method for robot-assisted endoluminal navigation that addresses challenges like tissue deformation and lack of landmarks. EndoSERV uses a segment-to-structure approach for long-range navigation and a real-to-virtual mapping technique to overcome label insufficiency by transferring real image features to a virtual domain with pose ground truth. Experiments on public and clinical datasets demonstrate the method's effectiveness, even without real pose labels, suggesting improved accuracy and robustness in endoluminal robot navigation.

Key Contribution

EndoSERV enables accurate endoluminal robot navigation even without real-world pose labels, by cleverly transferring real image features to a virtual environment for training.

Abstract

Robot-assisted endoluminal procedures are increasingly used for early cancer intervention. However, the intricate, narrow and tortuous pathways within the luminal anatomy pose substantial difficulties for robot navigation. Vision-based navigation offers a promising solution, but existing localization approaches are error-prone due to tissue deformation, in vivo artifacts and a lack of distinctive landmarks for consistent localization. This paper presents a novel EndoSERV localization method to address these challenges. It includes two main parts, \textit{i.e.}, \textbf{SE}gment-to-structure and \textbf{R}eal-to-\textbf{V}irtual mapping, and hence the name. For long-range and complex luminal structures, we divide them into smaller sub-segments and estimate the odometry independently. To cater for label insufficiency, an efficient transfer technique maps real image features to the virtual domain to use virtual pose ground truth. The training phases of EndoSERV include an offline pretraining to extract texture-agnostic features, and an online phase that adapts to real-world conditions. Extensive experiments based on both public and clinical datasets have been performed to demonstrate the effectiveness of the method even without any real pose labels.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EndoSERV: A Vision-based Endoluminal Robot Navigation System

Related Papers