Search papers, labs, and topics across Lattice.
This paper investigates bias redistribution in visual machine unlearning, specifically whether forgetting one demographic group (defined by age and gender) in CLIP models leads to bias amplification in other, correlated groups. Using Prompt Erasure, Prompt Reweighting, and Refusal Vector methods on the CelebA dataset, the authors find that unlearning primarily redistributes bias along gender lines, with performance consistently transferring from the Young Female to Old Female group. The Refusal Vector method reduces redistribution but at the cost of overall performance, suggesting current unlearning techniques struggle to account for embedding geometry and thus risk exacerbating bias.
Unlearning one demographic group in CLIP doesn't eliminate bias, it just shifts it, primarily along gender lines, revealing a concerning gender-dominant structure in the model's embedding space.
Machine unlearning enables models to selectively forget training data, driven by privacy regulations such as GDPR and CCPA. However, its fairness implications remain underexplored: when a model forgets a demographic group, does it neutralize that concept or redistribute it to correlated groups, potentially amplifying bias? We investigate this bias redistribution phenomenon on CelebA using CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) under a zero-shot classification setting across intersectional groups defined by age and gender. We evaluate three unlearning methods, Prompt Erasure, Prompt Reweighting, and Refusal Vector using per-group accuracy shifts, demographic parity gaps, and a redistribution score. Our results show that unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance. These findings highlight a fundamental limitation of current unlearning methods: without accounting for embedding geometry, they risk amplifying bias in retained groups.