Search papers, labs, and topics across Lattice.
This paper introduces MaskDiff-AD, a forward-only anomaly detection method leveraging masked diffusion models trained on nominal data to score anomalies based on reconstruction difficulty. By masking coordinates and assessing reconstruction likelihood, MaskDiff-AD operates directly on discrete state spaces without reverse-time sampling. Experiments across tabular and text anomaly detection benchmarks demonstrate MaskDiff-AD's competitive performance, achieving the best overall average rank compared to existing methods.
Anomaly detection in discrete data just got easier: MaskDiff-AD offers a forward-only, reverse-diffusion-free approach that tops the leaderboards.
Anomaly detection aims to identify samples that deviate from the nominal data distribution and is central to many safety-critical applications. However, developing effective anomaly detection methods for categorical, mixed-type, and discrete sequence data remains challenging and relatively underexplored. Masked diffusion models provide a natural way to model such data by learning to recover masked values from the remaining visible context. In this paper, we propose Masked Diffusion for Anomaly Detection (MaskDiff-AD), a forward-only method based on masked diffusion models trained only on nominal data. Given a test sample, MaskDiff-AD constructs anomaly scores from the difficulty of reconstructing randomly masked coordinates, yielding a content-sensitive score that operates directly on discrete state spaces while avoiding reverse-time sampling. We also develop a non-parametric variant of MaskDiff-AD and provide theoretical guarantees by characterizing Type-I and Type-II errors under a fixed detection threshold. Experiments on fourteen categorical and mixed-type tabular datasets from ADBench and UADAD, as well as four text anomaly detection datasets from NLP-ADBench, show that MaskDiff-AD achieves competitive performance against classical, diffusion-based, and recent tabular/text anomaly detection baselines. Notably, MaskDiff-AD achieves the best overall average rank, outperforming all twelve tabular baseline methods.