Feb 19, 2026arXiv:2602.17473

4D Monocular Surgical Reconstruction under Arbitrary Camera Motions

Jiwei Shan, Zeyu Cai, Cheng-Tai Hsieh, Yirui Li, Hao Liu, Lijun Han, Hesheng Wang, Shing Shin Cheng

AI Summary

The paper introduces Local-EndoGS, a novel framework for 4D reconstruction of deformable surgical scenes from monocular endoscopic videos with arbitrary camera motion. It addresses limitations of existing methods by employing a progressive, window-based global representation with local deformable scene models, and a coarse-to-fine initialization strategy leveraging multi-view geometry, cross-window information, and monocular depth priors. Experiments on endoscopic datasets demonstrate that Local-EndoGS achieves superior appearance quality and geometry compared to state-of-the-art methods.

Key Contribution

Reconstructing surgical scenes from monocular endoscope videos with large camera motion just got a whole lot better, thanks to a new window-based approach that doesn't need stereo depth or perfect camera tracking.

Abstract

Reconstructing deformable surgical scenes from endoscopic videos is challenging and clinically important. Recent state-of-the-art methods based on implicit neural representations or 3D Gaussian splatting have made notable progress. However, most are designed for deformable scenes with fixed endoscope viewpoints and rely on stereo depth priors or accurate structure-from-motion for initialization and optimization, limiting their ability to handle monocular sequences with large camera motion in real clinical settings. To address this, we propose Local-EndoGS, a high-quality 4D reconstruction framework for monocular endoscopic sequences with arbitrary camera motion. Local-EndoGS introduces a progressive, window-based global representation that allocates local deformable scene models to each observed window, enabling scalability to long sequences with substantial motion. To overcome unreliable initialization without stereo depth or accurate structure-from-motion, we design a coarse-to-fine strategy integrating multi-view geometry, cross-window information, and monocular depth priors, providing a robust foundation for optimization. We further incorporate long-range 2D pixel trajectory constraints and physical motion priors to improve deformation plausibility. Experiments on three public endoscopic datasets with deformable scenes and varying camera motions show that Local-EndoGS consistently outperforms state-of-the-art methods in appearance quality and geometry. Ablation studies validate the effectiveness of our key designs. Code will be released upon acceptance at: https://github.com/IRMVLab/Local-EndoGS.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

4D Monocular Surgical Reconstruction under Arbitrary Camera Motions

Related Papers