Mar 11, 2026arXiv:2603.11140

Procedural Fairness via Group Counterfactual Explanation

AI Summary

The paper introduces Group Counterfactual Integrated Gradients (GCIG), a novel in-processing regularization framework designed to enforce explanation invariance across different protected groups, conditioned on the true label. GCIG computes explanations relative to multiple group-conditional baselines and penalizes cross-group variation in these attributions during training, formalizing procedural fairness as group counterfactual explanation stability. Empirical results demonstrate that GCIG significantly reduces cross-group explanation disparity while maintaining competitive predictive performance and accuracy-fairness trade-offs compared to six state-of-the-art methods.

Key Contribution

Achieving fairness doesn't just mean equal outcomes—this work shows how to enforce consistent reasoning across groups by penalizing disparities in counterfactual explanations.

Abstract

Fairness in machine learning research has largely focused on outcome-oriented fairness criteria such as Equalized Odds, while comparatively less attention has been given to procedural-oriented fairness, which addresses how a model arrives at its predictions. Neglecting procedural fairness means it is possible for a model to generate different explanations for different protected groups, thereby eroding trust. In this work, we introduce Group Counterfactual Integrated Gradients (GCIG), an in-processing regularization framework that enforces explanation invariance across groups, conditioned on the true label. For each input, GCIG computes explanations relative to multiple Group Conditional baselines and penalizes cross-group variation in these attributions during training. GCIG formalizes procedural fairness as Group Counterfactual explanation stability and complements existing fairness objectives that constrain predictions alone. We compared GCIG empirically against six state-of-the-art methods, and the results show that GCIG substantially reduces cross-group explanation disparity while maintaining competitive predictive performance and accuracy-fairness trade-offs. Our results also show that aligning model reasoning across groups offers a principled and practical avenue for advancing fairness beyond outcome parity.

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Procedural Fairness via Group Counterfactual Explanation

Related Papers