Search papers, labs, and topics across Lattice.
This paper introduces the problem of differential subgroup discovery, aiming to identify subsets of two populations that share similar features but exhibit significant differences in a target outcome. They formulate an optimization objective for discovering these subgroups and provide conditions for causal interpretation. The authors propose DiffSub, a gradient-based method for tabular data, and demonstrate its effectiveness in identifying informative subgroups across various applications, including medical case studies and model error analysis.
Uncover hidden drivers of disparity: pinpoint the specific combinations of characteristics that explain outcome gaps between populations.
We study the problem of understanding where two populations differ within a feature space, which we formalize in the concept of a differential subgroup: a subset of individuals from both populations who, despite sharing similar characteristics, exhibit exceptional differences in a target outcome. Differential subgroups reveal the regions of the feature space where population-level gaps are most pronounced and can help practitioners identify the covariate combinations that are structurally responsible for these differences, e.g.~in clinical analysis, model diagnostics, or treatment-effect studies. We introduce a general optimization objective for discovering differential subgroups and establish conditions under which the resulting subgroups admit a causal interpretation of population differences. We propose DiffSub, a gradient-based approach that discovers interpretable differential subgroups in tabular data. Across synthetic benchmarks, medical case studies, model-error analyses, and treatment-effect settings, DiffSub identifies informative subgroups that reveal where population differences arise and why.