Jun 1, 2026arXiv:2606.02956

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

R. Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, N. Rack, Kaiwen Wang, F. Konstantinidis, Julian Truetsch, Carlos Fernández, Annika Batz, Kevin Rosch, Marlon Steiner, William Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, G. Stepanov, Holger Caesar, Omer cSahin Tacs, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

AI Summary

The KITScenes Multimodal dataset addresses critical limitations in existing autonomous driving datasets by providing high-fidelity sensor data and comprehensive HD maps validated through real-world trials. It features a synchronized sensor suite that includes advanced imaging technologies and 3D mapping of all driving-relevant traffic elements, enhancing the dataset's utility for training and evaluating autonomous systems. Additionally, the dataset introduces four benchmarks aimed at advancing spatial learning in embodied AI, thereby setting a new standard for geographic diversity and sensor fidelity in autonomous driving research.

Key Contribution

The KITScenes Multimodal dataset redefines the landscape of autonomous driving research with the most complete HD maps and high-fidelity sensor data available publicly.

Abstract

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References67

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Related Papers