Apr 9, 2026arXiv:2604.08535

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker, Simon Gerstenecker, Andreas Geiger, Andreas Geiger, Katrin Renz, Katrin Renz

AI Summary

The authors introduce Fail2Drive, a novel CARLA benchmark with 200 paired routes designed to evaluate closed-loop autonomous driving generalization under distribution shift, encompassing appearance, layout, behavioral, and robustness shifts. By pairing each shifted route with an in-distribution counterpart, the benchmark isolates the effect of the shift and enables quantitative diagnosis of failure modes. Evaluation of state-of-the-art models reveals a consistent 22.8% performance drop, highlighting failure modes such as ignoring LiDAR-visible objects and misunderstanding free/occupied space.

Key Contribution

Autonomous driving models are brittle: even small distribution shifts cause a 22.8% drop in success rate, revealing surprising failures like ignoring visible objects.

Abstract

Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorization rather than robust driving behavior. We introduce Fail2Drive, the first paired-route benchmark for closed-loop generalization in CARLA, with 200 routes and 17 new scenario classes spanning appearance, layout, behavioral, and robustness shifts. Each shifted route is matched with an in-distribution counterpart, isolating the effect of the shift and turning qualitative failures into quantitative diagnostics. Evaluating multiple state-of-the-art models reveals consistent degradation, with an average success-rate drop of 22.8\%. Our analysis uncovers unexpected failure modes, such as ignoring objects clearly visible in the LiDAR and failing to learn the fundamental concepts of free and occupied space. To accelerate follow-up work, Fail2Drive includes an open-source toolbox for creating new scenarios and validating solvability via a privileged expert policy. Together, these components establish a reproducible foundation for benchmarking and improving closed-loop driving generalization. We open-source all code, data, and tools at https://github.com/autonomousvision/fail2drive .

Eval Frameworks & Benchmarks Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Related Papers