CeBAMD environmentDepartment of Statistical ScienceDepartment of StatisticsNuffield Department of MedicineOxfordUCLJun 22, 2026arXiv:2606.23515

FairBED: A Bayesian Experimental Design Approach to Gathering Fairer Data

Marcel Hedman, Emily Alger, Brieuc Lehmann, Chris Holmes, Tom Rainforth

AI Summary

This paper introduces FairBED, a Bayesian experimental design framework aimed at gathering fairer datasets by minimizing the information gained about sensitive attributes during data acquisition. By quantifying dataset fairness through the lens of uninformative relationships with sensitive attributes, FairBED optimizes the data collection process to enhance fairness in model training. Empirical results demonstrate that models trained on FairBED-acquired data achieve better fairness-accuracy trade-offs compared to those trained on randomly collected or conventionally designed datasets.

Key Contribution

FairBED shows that you can design data acquisition processes that inherently reduce bias, leading to fairer machine learning models.

Abstract

Frameworks for ensuring fairness in machine learning typically focus on learning fair models from existing data. But this endeavor is often undermined by biases already present in that data. We therefore look to modify the data acquisition process itself to help gather fairer data that is inherently more suitable for training fair predictors. To this end, we introduce FairBED, which provides novel formulations for quantifying the fairness of datasets themselves based on the idea that fair datasets should be uninformative about sensitive attributes. We then use this to construct practical fairness-aware Bayesian experimental design (BED) objectives that maximize expected information gain about the target quantity of interest while minimizing expected information gain about sensitive attributes. We further derive a theoretical link between FairBED and demographic parity, and show empirically that models trained on data gathered using FairBED provide improved fairness-accuracy trade-offs compared to randomly acquired data and conventional BED.

Constitutional AI & AI Ethics Data Curation & Synthetic Data

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FairBED: A Bayesian Experimental Design Approach to Gathering Fairer Data

Related Papers