Search papers, labs, and topics across Lattice.
This paper introduces PLAG, a pseudo-label-guided anomaly generation method that enhances tabular anomaly detection by utilizing pseudo-anomalies to guide the identification of localized abnormal patterns in tabular features. By decoupling anomaly quantification into feature-level abnormalities and employing a two-stage data selection strategy, PLAG effectively mitigates the reliance on scarce ground-truth labels while improving the fidelity and diversity of synthetic anomalies. Extensive experiments show that PLAG outperforms eight baseline methods, achieving state-of-the-art results and boosting F1-scores by 0.08 to 0.21 when integrated with existing unsupervised detectors.
PLAG transforms tabular anomaly detection by leveraging pseudo-labels to uncover fine-grained, localized anomalies, outperforming traditional methods that struggle with label scarcity.
Identifying anomalous instances in tabular data is essential for improving data reliability and maintaining system stability. Due to the scarcity of ground-truth anomaly labels, existing methods mainly rely on unsupervised anomaly detection models, or exploit a small number of labeled anomalies to facilitate detection via sample generation or contrastive learning. However, unsupervised methods lack sufficient anomaly awareness, while current generation and contrastive approaches tend to compute anomalies globally, overlooking the localized anomaly patterns of tabular features, resulting in suboptimal detection performance. To address these limitations, we propose PLAG, a pseudo-label-guided anomaly generation method designed to enhance tabular anomaly detection. Specifically, by utilizing pseudo-anomalies as guidance signals and decoupling the overall anomaly quantification of a sample into an accumulation of feature-level abnormalities, PLAG not only effectively obviates the need for scarce ground-truth labels but also provides a novel perspective for the model to comprehend localized anomalous signals at a fine-grained level. Furthermore, a two-stage data selection strategy is proposed, integrating format verification and uncertainty estimation to rigorously filter candidate samples, thereby ensuring the fidelity and diversity of the synthetic anomalies. Ultimately, these filtered synthetic anomalies serve as robust discriminative guidance, empowering the model to better separate normal and anomalous instances. Extensive experiments demonstrate that PLAG achieves state-of-the-art performance against eight representative baselines. Moreover, as a flexible framework, it integrates seamlessly with existing unsupervised detectors, consistently boosting F1-scores by 0.08 to 0.21.