LuxembourgLuxembourg Institute of Science and TechnologyFeb 12, 2026arXiv:2602.12038

An Empirical Study of the Imbalance Issue in Software Vulnerability Detection

Yuejun Guo, Qiang Hu, Qiang Tang, Y. L. Traon

AI Summary

This paper investigates the impact of data imbalance on deep learning-based software vulnerability detection using nine open-source datasets and two state-of-the-art DL models. The study confirms that data imbalance significantly affects model performance and that existing imbalance solutions exhibit varying effectiveness across datasets and evaluation metrics. The authors find that focal loss improves precision, mean false error and class-balanced loss improve recall, and random over-sampling improves F1-measure, but no single solution excels across all metrics.

Key Contribution

Turns out, your fancy deep learning model for vulnerability detection is probably struggling because of imbalanced data, and the fix isn't as simple as just throwing in focal loss.

Abstract

Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within extensive code volumes. Despite its promise, DL-based vulnerability detection remains in its early stages, with model performance exhibiting variability across datasets. Drawing insights from other well-explored application areas like computer vision, we conjecture that the imbalance issue (the number of vulnerable code is extremely small) is at the core of the phenomenon. To validate this, we conduct a comprehensive empirical study involving nine open-source datasets and two state-of-the-art DL models. The results confirm our conjecture. We also obtain insightful findings on how existing imbalance solutions perform in vulnerability detection. It turns out that these solutions perform differently as well across datasets and evaluation metrics. Specifically: 1) Focal loss is more suitable to improve the precision, 2) mean false error and class-balanced loss encourages the recall, and 3) random over-sampling facilitates the F1-measure. However, none of them excels across all metrics. To delve deeper, we explore external influences on these solutions and offer insights for developing new solutions.

Code Generation & Program Synthesis Computer Vision

Citation Metrics

Citations4

Influential citations0

References51

Year2026

VenueEuropean Symposium on Research in Computer Security

Related Papers

Finding related papers...

Search

An Empirical Study of the Imbalance Issue in Software Vulnerability Detection

Related Papers