Apr 14, 2026arXiv:2604.12837

GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

Yi Liu, Haoxuan Xu, H. Duan, Hongbo Duan, Keyu Fan, Zhengyang Zhang, Peiyu Zhuang, Pengting Luo, Houde Liu

AI Summary

GGD-SLAM addresses the challenge of visual SLAM in dynamic environments by incorporating a generalizable motion model without relying on semantic annotations or depth input. It uses a FIFO queue with sequential attention for dynamic semantic feature extraction and a dynamic feature enhancer to separate static and dynamic components. The system also introduces a static information sampling method for occlusion filling and a distractor-adaptive SSIM loss, achieving state-of-the-art performance in camera pose estimation and dense reconstruction on real-world dynamic datasets.

Key Contribution

Achieve robust SLAM in dynamic environments without semantic labels or depth sensors by disentangling scene dynamics with a generalizable motion model.

Abstract

Visual SLAM algorithms achieve significant improvements through the exploration of 3D Gaussian Splatting (3DGS) representations, particularly in generating high-fidelity dense maps. However, they depend on a static environment assumption and experience significant performance degradation in dynamic environments. This paper presents GGD-SLAM, a framework that employs a generalizable motion model to address the challenges of localization and dense mapping in dynamic environments - without predefined semantic annotations or depth input. Specifically, the proposed system employs a First-In-First-Out (FIFO) queue to manage incoming frames, facilitating dynamic semantic feature extraction through a sequential attention mechanism. This is integrated with a dynamic feature enhancer to separate static and dynamic components. Additionally, to minimize dynamic distractors'impact on the static components, we devise a method to fill occluded areas via static information sampling and design a distractor-adaptive Structure Similarity Index Measure (SSIM) loss tailored for dynamic environments, significantly enhancing the system's resilience. Experiments conducted on real-world dynamic datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense reconstruction in dynamic scenes.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

Related Papers