Search papers, labs, and topics across Lattice.
The paper introduces RAID, a retrieval-augmented unsupervised anomaly detection framework that leverages retrieved normal samples to guide noise suppression in anomaly map generation. RAID constructs a hierarchical vector database to retrieve class-, semantic-, and instance-level representations, forming a coarse-to-fine retrieval pipeline. A guided Mixture-of-Experts (MoE) network then adaptively suppresses matching noise based on the retrieved samples, producing fine-grained anomaly maps.
By reinterpreting unsupervised anomaly detection through the lens of Retrieval-Augmented Generation, RAID achieves state-of-the-art performance by using retrieved normal samples to guide noise suppression in anomaly map generation.
Unsupervised Anomaly Detection (UAD) aims to identify abnormal regions by establishing correspondences between test images and normal templates. Existing methods primarily rely on image reconstruction or template retrieval but face a fundamental challenge: matching between test images and normal templates inevitably introduces noise due to intra-class variations, imperfect correspondences, and limited templates. Observing that Retrieval-Augmented Generation (RAG) leverages retrieved samples directly in the generation process, we reinterpret UAD through this lens and introduce \textbf{RAID}, a retrieval-augmented UAD framework designed for noise-resilient anomaly detection and localization. Unlike standard RAG that enriches context or knowledge, we focus on using retrieved normal samples to guide noise suppression in anomaly map generation. RAID retrieves class-, semantic-, and instance-level representations from a hierarchical vector database, forming a coarse-to-fine pipeline. A matching cost volume correlates the input with retrieved exemplars, followed by a guided Mixture-of-Experts (MoE) network that leverages the retrieved samples to adaptively suppress matching noise and produce fine-grained anomaly maps. RAID achieves state-of-the-art performance across full-shot, few-shot, and multi-dataset settings on MVTec, VisA, MPDD, and BTAD benchmarks. \href{https://github.com/Mingxiu-Cai/RAID}{https://github.com/Mingxiu-Cai/RAID}.