CambridgeEdinburghGroningenImperialMBZUAINTUJun 10, 2026arXiv:2606.12088

Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles

Shun Shao, Zheng Zhao, Anna Korhonen, Yftah Ziser, Shay B. Cohen

AI Summary

This paper introduces H-SAL, a novel approach for debiasing language models without relying on protected attributes like gender or race, which are often unavailable due to privacy and legal constraints. By utilizing implicit signals from self-description text, the method effectively performs post-hoc concept and attribute erasure, demonstrating that this indirect debiasing can match or even surpass traditional methods that use explicit labels. The study also establishes a new multi-domain fairness benchmark for helpfulness prediction, enhancing the landscape of representation-level fairness research in NLP.

Key Contribution

Implicit self-description can achieve comparable or superior debiasing results to traditional methods that rely on sensitive attributes.

Abstract

Most fairness research in NLP assumes direct access to protected attributes such as gender, race, or nationality. In practice, however, such information is often unavailable due to privacy constraints, missing metadata, or legal restrictions, even though models may infer it from indirect textual cues. This raises a key question: can debiasing succeed without direct access to sensitive attributes? We propose H-SAL, which performs post-hoc concept and attribute erasure using self-description text as an implicit debiasing signal. To support this setting, we introduce a multi-domain Stack Exchange-based fairness benchmark for helpfulness prediction that includes both explicit and implicit signals, enabling comparison between standard debiasing with protected labels and debiasing without access to sensitive information. Across encoder and decoder-only language models, we find that implicit self-description often matches or outperforms explicit-label-based debiasing. Our results broaden representation-level fairness research and provide a new benchmark for studying debiasing under realistic data constraints.

Constitutional AI & AI Ethics Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles

Related Papers