IMT School for Advanced Studies LuccaNapoliApr 29, 2026arXiv:2604.26667

Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

AI Summary

This paper investigates the prediction of post-release faults in Python systems using machine learning. They trained and evaluated various ML and DL models on a balanced dataset of over 4,000 labeled faults, using product, process, statistical, and Python-specific metrics, along with normalized code representations. The key finding is that supervised metric-based models (RandomForest, XGBoost, CatBoost) significantly outperform LLMs and unsupervised models, achieving a 0.85-0.9 recall and drastically reducing false negatives, with process metrics being the most predictive.

Key Contribution

Forget LLMs, simple process metrics like code age and developer activity are the real MVPs for predicting bugs that slip into production Python code.

Abstract

Python's dynamic nature complicates testing and increases the possibility that some defects evade detection, so an effective fault prediction becomes essential. We examine whether post-release faults can be predicted using modern ML and DL. Using a balanced dataset of over 4,000 labeled faults with 83 product, process, statistical, and Python-specific metrics plus normalized code representations, we conduct cross-project experiments. LLMs and unsupervised models fail to distinguish residual from non-residual faults, while supervised metric-based models (RandomForest, XGBoost, CatBoost) perform far better, yielding a 0.85-0.9 recall and cutting false negatives by an order of magnitude. Process metrics, especially age, churn, and developer-activity, alongside class and file size, consistently prove most predictive. Notably, the Principal Component Analysis shows that metrics and code embeddings occupy distinct regions of the representation space, suggesting that they capture complementary rather than redundant information.

Code Generation & Program Synthesis

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

Related Papers