NVIDIAFeb 17, 2026arXiv:2602.15298

X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

Qi Zhang, Dian Chen, Lance M. Kaplan, Audun Jøsang, Dong Hyun Jeong, Jin-Hee Cho

AI Summary

The paper introduces X-MAP, a framework for analyzing misclassifications in spam and phishing detection by identifying topic-level semantic patterns associated with model failures. X-MAP uses SHAP values and non-negative matrix factorization to create interpretable topic profiles for correctly classified and misclassified instances, then quantifies the deviation of individual messages from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets demonstrate that X-MAP achieves high AUROC (up to 0.98) and significantly reduces false rejection rates, while also recovering a substantial portion of falsely rejected correct predictions when used as a repair layer.

Key Contribution

Uncover why your spam filter fails: X-MAP reveals topic-level semantic patterns that expose the weaknesses of your detection model.

Abstract

Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.

Interpretability & Mechanistic Interp Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

Related Papers