Esslingen University of Applied SciencesApr 9, 2026arXiv:2604.08008

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

Felix Embacher, J. Uhrig, Jonas Uhrig, Marius Cordts, Markus Enzweiler

AI Summary

The authors introduce SearchAD, a large-scale (423k frames, 513k bounding boxes) image retrieval dataset for autonomous driving focused on rare, safety-critical scenarios across 90 categories. Unlike existing datasets, SearchAD emphasizes semantic image retrieval for tasks like text-to-image and image-to-image search, addressing the challenge of locating extremely rare classes. Experiments reveal that text-based retrieval methods outperform image-based approaches, and while fine-tuning improves performance, retrieval capabilities remain limited, highlighting the difficulty of the task.

Key Contribution

Finding needles in the haystack of autonomous driving data is harder than we thought: even with fine-tuning, current retrieval methods struggle to identify rare, safety-critical scenarios in the new SearchAD dataset.

Abstract

Retrieving rare and safety-critical driving scenarios from large-scale datasets is essential for building robust autonomous driving (AD) systems. As dataset sizes continue to grow, the key challenge shifts from collecting more data to efficiently identifying the most relevant samples. We introduce SearchAD, a large-scale rare image retrieval dataset for AD containing over 423k frames drawn from 11 established datasets. SearchAD provides high-quality manual annotations of more than 513k bounding boxes covering 90 rare categories. It specifically targets the needle-in-a-haystack problem of locating extremely rare classes, with some appearing fewer than 50 times across the entire dataset. Unlike existing benchmarks, which focused on instance-level retrieval, SearchAD emphasizes semantic image retrieval with a well-defined data split, enabling text-to-image and image-to-image retrieval, few-shot learning, and fine-tuning of multi-modal retrieval models. Comprehensive evaluations show that text-based methods outperform image-based ones due to stronger inherent semantic grounding. While models directly aligning spatial visual features with language achieve the best zero-shot results, and our fine-tuning baseline significantly improves performance, absolute retrieval capabilities remain unsatisfactory. With a held-out test set on a public benchmark server, SearchAD establishes the first large-scale dataset for retrieval-driven data curation and long-tail perception research in AD: https://iis-esslingen.github.io/searchad/

Computer Vision Data Curation & Synthetic Data Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References59

Year2026

VenueN/A

Related Papers

Finding related papers...