CambridgeCenter for Machine Learning (MCMLKITTU MunichApr 13, 2026arXiv:2604.11401

GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors

Qilin Zhang, Jinyu Zhu, Olaf Wysocki, Benjamin Busam, Boris Jutzi

AI Summary

GS4City introduces a hierarchical semantic Gaussian Splatting method that leverages CityGML models as priors to improve urban scene understanding. It uses a two-pass raycasting approach to derive image-aligned masks from LoD3 CityGML models, incorporating parent-child relationships to refine facade element segmentation. By fusing these geometry-grounded masks with foundation model predictions and learning compact Gaussian identity encodings, GS4City achieves state-of-the-art semantic segmentation results on urban datasets.

Key Contribution

Ditch ambiguous 2D foundation models: GS4City uses city-model priors to boost semantic Gaussian Splatting, achieving up to 15.8 IoU improvement in coarse building segmentation.

Abstract

Recent semantic 3D Gaussian Splatting (3DGS) methods primarily rely on 2D foundation models, often yielding ambiguous boundaries and limited support for structured urban semantics. While city models such as CityGML encode hierarchically organized semantics together with building geometry, these labels cannot be directly mapped to Gaussian primitives. We present GS4City, a hierarchical semantic Gaussian Splatting method that incorporates city-model priors for urban scene understanding. GS4City derives reliable image-aligned masks from Level of Detail (LoD) 3 CityGML models via two-pass raycasting, explicitly using parent-child relations to validate and recover fine-grained facade elements. It then fuses these geometry-grounded masks with foundation-model predictions to establish scene-consistent instance correspondences, and learns a compact identity encoding for each Gaussian under joint 2D identity supervision and 3D spatial regularization. Experiments on the TUM2TWIN and Gold Coast datasets show that GS4City effectively incorporates structured building semantics into Gaussian scene representations, outperforming existing 2D-driven semantic 3DGS baselines, including LangSplat and Gaga, by up to 15.8 IoU points in coarse building segmentation and 14.2 mIoU points in fine-grained semantic segmentation. By bridging structured city models and photorealistic Gaussian scene representations, GS4City enables semantically queryable and structure-aware urban reconstruction. Code is available at https://github.com/Jinyzzz/GS4City.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors

Related Papers