Search papers, labs, and topics across Lattice.
This paper introduces FairBED, a Bayesian experimental design framework aimed at gathering fairer datasets by minimizing the information gained about sensitive attributes during data acquisition. By quantifying dataset fairness through the lens of uninformative relationships with sensitive attributes, FairBED optimizes the data collection process to enhance fairness in model training. Empirical results demonstrate that models trained on FairBED-acquired data achieve better fairness-accuracy trade-offs compared to those trained on randomly collected or conventionally designed datasets.
FairBED shows that you can design data acquisition processes that inherently reduce bias, leading to fairer machine learning models.
Frameworks for ensuring fairness in machine learning typically focus on learning fair models from existing data. But this endeavor is often undermined by biases already present in that data. We therefore look to modify the data acquisition process itself to help gather fairer data that is inherently more suitable for training fair predictors. To this end, we introduce FairBED, which provides novel formulations for quantifying the fairness of datasets themselves based on the idea that fair datasets should be uninformative about sensitive attributes. We then use this to construct practical fairness-aware Bayesian experimental design (BED) objectives that maximize expected information gain about the target quantity of interest while minimizing expected information gain about sensitive attributes. We further derive a theoretical link between FairBED and demographic parity, and show empirically that models trained on data gathered using FairBED provide improved fairness-accuracy trade-offs compared to randomly acquired data and conventional BED.