IBM ResearchJun 8, 2026arXiv:2606.09670

Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Mateo Diaz-Bone, Daniel Caraballo, Florian Scheidegger, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Roy Assaf, Niccolo Avogaro, Yagmur G. Cinar, Brown Ebouky, Filip M. Janicki, Piotr S. Kluska, Cezary Skura, Cristiano Malossi

AI Summary

This paper addresses the limitations of existing anomaly detection methods that struggle under variable real-world conditions by introducing a novel visual prompting pipeline that employs foreground-background masking. Additionally, it enhances domain adaptability through a unique unfreezing mechanism in student-teacher models and utilizes a data augmentation strategy with diffusion-generated synthetic images. The proposed approach achieves a significant 3.5 percentage point improvement over the previous state-of-the-art on the AeBAD dataset, demonstrating its effectiveness in challenging scenarios.

Key Contribution

Anomaly detection can be dramatically improved by leveraging visual prompting and synthetic data, achieving a notable 3.5% boost in performance on the AeBAD dataset.

Abstract

Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, viewpoint, background, illumination, and centered placement - are violated. Those variations that occur render anomaly detection methods unusable in many real-world scenarios. To address these limitations, we introduce three key contributions: (1) a visual prompting pipeline that isolates objects using foreground-background masking; (2) a mechanism for unfreezing the teacher in student-teacher models to improve domain adaptability; and (3) a data augmentation strategy leveraging diffusion-generated synthetic images to enhance anomaly detection performance. We achieve a 3.5 percentage point improvement over the previous state-of-the-art on the challenging AeBAD dataset by using the Masked Multiscale Reconstruction (MMR) model as our backbone.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Related Papers