Search papers, labs, and topics across Lattice.
FairTree, a novel subgroup fairness auditing algorithm, is introduced to address the limitations of existing methods like SliceFinder by directly handling continuous covariates and decomposing performance disparities into bias and variance. The algorithm offers two variations: a permutation-based approach and a fluctuation test. Experiments on simulated data and the UCI Adult Census dataset demonstrate that the fluctuation test has higher power while both approaches maintain satisfactory false-positive rates, providing a flexible framework for fairness evaluation.
Uncover hidden performance disparities in your ML models with FairTree, a new auditing tool that pinpoints fairness issues across continuous, categorical, and ordinal features while dissecting bias and variance contributions.
The evaluation of machine learning models typically relies mainly on performance metrics based on loss functions, which risk to overlook changes in performance in relevant subgroups. Auditing tools such as SliceFinder and SliceLine were proposed to detect such groups, but usually have conceptual disadvantages, such as the inability to directly address continuous covariates. In this paper, we introduce FairTree, a novel algorithm adapted from psychometric invariance testing. Unlike SliceFinder and related algorithms, FairTree directly handles continuous, categorical, and ordinal features without discretization. It further decomposes performance disparities into systematic bias and variance, allowing a categorization of changes in algorithm performance. We propose and evaluate two variations of the algorithm: a permutation-based approach, which is conceptually closer to SliceFinder, and a fluctuation test. Through simulation studies that include a direct comparison with SliceLine, we demonstrate that both approaches have a satisfactory rate of false-positive results, but that the fluctuation approach has relatively higher power. We further illustrate the method on the UCI Adult Census dataset. The proposed algorithms provide a flexible framework for the statistical evaluation of the performance and aspects of fairness of machine learning models in a wide range of applications even in relatively small data.