Search papers, labs, and topics across Lattice.
This paper introduces the Credibility Index via Explanation Stability (CIES), a novel metric to quantify the robustness of XAI explanations (SHAP, LIME) under realistic data perturbations in business contexts. CIES uses a rank-weighted distance function to penalize instability in the most important features, reflecting the higher business impact of changes in key decision drivers. Experiments across three datasets and four tree-based models demonstrate CIES's superior discriminative power compared to a uniform baseline, highlighting the impact of model complexity and data balancing on explanation credibility.
A "credibility warning system" for AI-driven business decisions is now possible, thanks to a new metric that reveals how much explanations wobble when the data shifts.
Explainable Artificial Intelligence (XAI) methods (SHAP, LIME) are increasingly adopted to interpret models in high-stakes businesses. However, the credibility of these explanations, their stability under realistic data perturbations, remains unquantified. This paper introduces the Credibility Index via Explanation Stability (CIES), a mathematically grounded metric that measures how robust a model's explanations are when subject to realistic business noise. CIES captures whether the reasons behind a prediction remain consistent, not just the prediction itself. The metric employs a rank-weighted distance function that penalizes instability in the most important features disproportionately, reflecting business semantics where changes in top decision drivers are more consequential than changes in marginal features. We evaluate CIES across three datasets (customer churn, credit risk, employee attrition), four tree-based classification models and two data balancing conditions. Results demonstrate that model complexity impacts explanation credibility, class imbalance treatment via SMOTE affects not only predictive performance but also explanation stability, and CIES provides statistically superior discriminative power compared to a uniform baseline metric (p<0.01 in all 24 configurations). A sensitivity analysis across four noise levels confirms the robustness of the metric itself. These findings offer business practitioners a deployable"credibility warning system"for AI-driven decision support.