Search papers, labs, and topics across Lattice.
GGD-SLAM addresses the challenge of visual SLAM in dynamic environments by incorporating a generalizable motion model without relying on semantic annotations or depth input. It uses a FIFO queue with sequential attention for dynamic semantic feature extraction and a dynamic feature enhancer to separate static and dynamic components. The system also introduces a static information sampling method for occlusion filling and a distractor-adaptive SSIM loss, achieving state-of-the-art performance in camera pose estimation and dense reconstruction on real-world dynamic datasets.
Achieve robust SLAM in dynamic environments without semantic labels or depth sensors by disentangling scene dynamics with a generalizable motion model.
Visual SLAM algorithms achieve significant improvements through the exploration of 3D Gaussian Splatting (3DGS) representations, particularly in generating high-fidelity dense maps. However, they depend on a static environment assumption and experience significant performance degradation in dynamic environments. This paper presents GGD-SLAM, a framework that employs a generalizable motion model to address the challenges of localization and dense mapping in dynamic environments - without predefined semantic annotations or depth input. Specifically, the proposed system employs a First-In-First-Out (FIFO) queue to manage incoming frames, facilitating dynamic semantic feature extraction through a sequential attention mechanism. This is integrated with a dynamic feature enhancer to separate static and dynamic components. Additionally, to minimize dynamic distractors'impact on the static components, we devise a method to fill occluded areas via static information sampling and design a distractor-adaptive Structure Similarity Index Measure (SSIM) loss tailored for dynamic environments, significantly enhancing the system's resilience. Experiments conducted on real-world dynamic datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense reconstruction in dynamic scenes.