Search papers, labs, and topics across Lattice.
This paper addresses the problem of ghosting artifacts in 3D Gaussian Splatting (3DGS) caused by transient objects in multi-view captures. They propose a semantic filtering framework that leverages CLIP similarity scores between rendered views and distractor text prompts to identify and remove transient Gaussians. The method demonstrates improved reconstruction quality on the RobustNeRF benchmark compared to vanilla 3DGS, while maintaining real-time rendering and minimal memory overhead.
Ditch motion cues: CLIP-guided semantic filtering slashes ghosting artifacts in 3D Gaussian Splatting by identifying transient objects, even when parallax throws motion-based methods off track.
Transient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on scene decomposition at significant memory cost or on motion-based heuristics that were vulnerable to parallax ambiguity. A semantic filtering framework was proposed for category-aware transient removal using vision-language models. CLIP similarity scores between rendered views and distractor text prompts were accumulated per-Gaussian across training iterations. Gaussians exceeding a calibrated threshold underwent opacity regularization and periodic pruning. Unlike motion-based approaches, semantic classification resolved parallax ambiguity by identifying object categories independently of motion patterns. Experiments on the RobustNeRF benchmark demonstrated consistent improvement in reconstruction quality over vanilla 3DGS across four sequences, while maintaining minimal memory overhead and real-time rendering performance. Threshold calibration and comparisons with baselines validated semantic guidance as a practical strategy for transient removal in scenarios with predictable distractor categories.