Digital Research Center of SfaxDLRApr 21, 2026arXiv:2604.19411

GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes

Joshua Niemeijer, Alaa Eddine Ben Zekri, Reza Bahmanyar, Philipp M. Schmälzle, Houda Chaabouni-Chouayakh, Franz Kurz

AI Summary

The paper introduces GOLD-BEV, a framework for learning dense bird's-eye-view (BEV) semantic maps of dynamic road scenes from ego-centric sensors, using synchronized aerial imagery as supervision during training. They leverage aerial crops as an intuitive target space for dense semantic annotation, mitigating ambiguities of ego-only BEV labeling and temporal inconsistencies. The framework extends beyond aerial coverage by synthesizing pseudo-aerial BEV images from ego sensors, enabling scalable annotation and uncertainty-aware pseudo-labeling.

Key Contribution

Synchronized aerial imagery unlocks dense, geometrically consistent BEV semantic mapping of dynamic road scenes, even from ego-centric sensors alone.

Abstract

Understanding road scenes in a geometrically consistent, scene-centric representation is crucial for planning and mapping. We present GOLD-BEV, a framework that learns dense bird's-eye-view (BEV) semantic environment maps-including dynamic agents-from ego-centric sensors, using time-synchronized aerial imagery as supervision only during training. BEV-aligned aerial crops provide an intuitive target space, enabling dense semantic annotation with minimal manual effort and avoiding the ambiguity of ego-only BEV labeling. Crucially, strict aerial-ground synchronization allows overhead observations to supervise moving traffic participants and mitigates the temporal inconsistencies inherent to non-synchronized overhead sources. To obtain scalable dense targets, we generate BEV pseudo-labels using domain-adapted aerial teachers, and jointly train BEV segmentation with optional pseudo-aerial BEV reconstruction for interpretability. Finally, we extend beyond aerial coverage by learning to synthesize pseudo-aerial BEV images from ego sensors, which support lightweight human annotation and uncertainty-aware pseudo-labeling on unlabeled drives.

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes

Related Papers