Search papers, labs, and topics across Lattice.
This paper investigates the instability of symbolic models in software engineering analytics due to their reliance on correlation-based split criteria and the heuristic approximations used by causal discovery algorithms. The authors propose incorporating causality-aware split criteria, specifically conditional-entropy split criteria and confounder filtering, into decision trees to improve stability and robustness. Through experiments on 120+ multi-objective optimization tasks, they demonstrate that causality-aware trees exhibit improved stability compared to correlation-based trees (EZR) and approach the stability of human expert judgments, without sacrificing predictive performance.
Causality-aware decision trees offer a more stable and robust alternative to traditional correlation-based models in software engineering, outperforming even human experts.
Background: Symbolic models, particularly decision trees, are widely used in software engineering for explainable analytics in defect prediction, configuration tuning, and software quality assessment. Most of these models rely on correlational split criteria, such as variance reduction or information gain, which identify statistical associations but cannot imply causation between X and Y. Recent empirical studies in software engineering show that both correlational models and causal discovery algorithms suffer from pronounced instability. This instability arises from two complementary issues: 1-Correlation-based methods conflate association with causation. 2-Causal discovery algorithms rely on heuristic approximations to cope with the NP-hard nature of structure learning, causing their inferred graphs to vary widely under minor input perturbations. Together, these issues undermine trust, reproducibility, and the reliability of explanations in real-world SE tasks. Objective: This study investigates whether incorporating causality-aware split criteria into symbolic models can improve their stability and robustness, and whether such gains come at the cost of predictive or optimization performance. We additionally examine how the stability of human expert judgments compares to that of automated models. Method: Using 120+ multi-objective optimization tasks from the MOOT repository of multi-objective optimization tasks, we evaluate stability through a preregistered bootstrap-ensemble protocol that measures variance with win-score assignments. We compare the stability of human causal assessments with correlation-based decision trees (EZR). We would also compare the causality-aware trees, which leverage conditional-entropy split criteria and confounder filtering. Stability and performance differences are analyzed using statistical methods (variance, Gini Impurity, KS test, Cliff's delta)