Search papers, labs, and topics across Lattice.
This paper addresses the problem of degraded performance in distractor-free 3D Gaussian Splatting (3DGS) under sparse input conditions, which arises from the unreliability of color residual heuristics with limited observations. They propose a framework that incorporates rich prior information from the geometry foundation model VGGT for camera parameter estimation and initial point cloud generation, attention maps from VGGT for semantic entity matching, and Vision-Language Models (VLMs) to identify and preserve large static regions. Experimental results demonstrate the effectiveness and robustness of their approach in mitigating transient distractors for sparse-view 3DGS training.
Sparse-view 3D Gaussian Splatting gets a major boost by incorporating priors from geometry foundation models and VLMs to overcome the limitations of color residual heuristics.
3D Gaussian Splatting (3DGS) enables efficient training and fast novel view synthesis in static environments. To address challenges posed by transient objects, distractor-free 3DGS methods have emerged and shown promising results when dense image captures are available. However, their performance degrades significantly under sparse input conditions. This limitation primarily stems from the reliance on the color residual heuristics to guide the training, which becomes unreliable with limited observations. In this work, we propose a framework to enhance distractor-free 3DGS under sparse-view conditions by incorporating rich prior information. Specifically, we first adopt the geometry foundation model VGGT to estimate camera parameters and generate a dense set of initial 3D points. Then, we harness the attention maps from VGGT for efficient and accurate semantic entity matching. Additionally, we utilize Vision-Language Models (VLMs) to further identify and preserve the large static regions in the scene. We also demonstrate how these priors can be seamlessly integrated into existing distractor-free 3DGS methods. Extensive experiments confirm the effectiveness and robustness of our approach in mitigating transient distractors for sparse-view 3DGS training.