Search papers, labs, and topics across Lattice.
The paper introduces the Error Sensitivity Profile (ESP), a metric to quantify a classification model's sensitivity to errors in individual or multiple features within the training data. ESP enables prioritization of data cleaning efforts by identifying error types and features that most significantly impact model performance. Experiments across 14 classification models and two datasets demonstrate that feature importance based on ESP differs from feature importance based on correlation with the target variable.
Forget blindly chasing correlations – this paper reveals that the features you *think* are most important for model performance might not be the ones where data cleaning yields the biggest gains.
The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in multiple features. By leveraging ESP, data-cleaning efforts can be prioritized based on error types and features most likely to affect model performance. To support the computation of this metric, an integrated suite of tools, called \dirty, is created. We conduct an extensive experimental study on two widely used datasets using 14 classification models, revealing that performance degradation is not always predictable from simple correlations with the target variable.