Search papers, labs, and topics across Lattice.
This paper introduces a Structure-to-Image (S2I) paradigm for zero-shot monocular depth estimation (MDE) in colonoscopy, addressing the sim-to-real domain gap by using depth maps as the generative foundation. The method incorporates phase congruency for domain adaptation and a cross-level structure constraint to improve both geometric structure and fine-grained details. Experiments on a phantom dataset demonstrate that MDE models fine-tuned on S2I-generated data achieve up to a 44.18% reduction in RMSE compared to existing methods.
Achieve a 44% RMSE reduction in monocular depth estimation for colonoscopy by turning depth maps into an active generative foundation for sim-to-real adaptation.
Monocular depth estimation (MDE) for colonoscopy is hampered by the domain gap between simulated and real-world images. Existing image-to-image translation methods, which use depth as a posterior constraint, often produce structural distortions and specular highlights by failing to balance realism with structure consistency. To address this, we propose a Structure-to-Image paradigm that transforms the depth map from a passive constraint into an active generative foundation. We are the first to introduce phase congruency to colonoscopic domain adaptation and design a cross-level structure constraint to co-optimize geometric structures and fine-grained details like vascular textures. In zero-shot evaluations conducted on a publicly available phantom dataset, the MDE model that was fine-tuned on our generated data achieved a maximum reduction of 44.18% in RMSE compared to competing methods. Our code is available at https://github.com/YyangJJuan/PC-S2I.git.