Search papers, labs, and topics across Lattice.
The paper introduces GroundingAnomaly, a diffusion-based framework for few-shot anomaly synthesis that addresses limitations in existing inpainting-based methods and inaccurate mask generation. It uses a Spatial Conditioning Module with per-pixel semantic maps for spatial control and a Gated Self-Attention Module to inject conditioning tokens into a frozen U-Net. Experiments on MVTec AD and VisA datasets show GroundingAnomaly generates high-quality anomalies and achieves SOTA performance in anomaly detection, segmentation, and instance-level detection.
Synthesizing realistic anomalies for industrial inspection is now possible with just a few examples, thanks to spatially-grounded diffusion that outperforms existing inpainting techniques.
The performance of visual anomaly inspection in industrial quality control is often constrained by the scarcity of real anomalous samples. Consequently, anomaly synthesis techniques have been developed to enlarge training sets and enhance downstream inspection. However, existing methods either suffer from poor integration caused by inpainting or fail to provide accurate masks. To address these limitations, we propose GroundingAnomaly, a novel few-shot anomaly image generation framework. Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps to enable precise spatial control over the synthesized anomalies. Furthermore, a Gated Self-Attention Module is designed to inject conditioning tokens into a frozen U-Net via gated attention layers. This carefully preserves pretrained priors while ensuring stable few-shot adaptation. Extensive evaluations on the MVTec AD and VisA datasets demonstrate that GroundingAnomaly generates high-quality anomalies and achieves state-of-the-art performance across multiple downstream tasks, including anomaly detection, segmentation, and instance-level detection.