Search papers, labs, and topics across Lattice.
This paper introduces a framework to stress-test LLM-based hiring pipelines for demographic bias arising from subtle sociocultural markers in anonymized resumes. They augment 100 neutral resumes into 4100 variants spanning four ethnicities and two genders, differing only in job-irrelevant markers, and evaluate 18 LLMs in direct comparison and score-and-shortlist settings. Results show that LLMs can infer demographic attributes and exhibit systematic disparities, favoring markers associated with Chinese and Caucasian males, and that explanation prompting amplifies this bias.
Even after removing names and other PII, LLMs still exhibit significant demographic biases in resume screening, favoring candidates based on subtle sociocultural markers like language and hobbies.
Large Language Models (LLMs) are increasingly deployed in resume screening pipelines. Although explicit PII (e.g., names) is commonly redacted, resumes typically retain subtle sociocultural markers (languages, co-curricular activities, volunteering, hobbies) that can act as demographic proxies. We introduce a generalisable stress-test framework for hiring fairness, instantiated in the Singapore context: 100 neutral job-aligned resumes are augmented into 4100 variants spanning four ethnicities and two genders, differing only in job-irrelevant markers. We evaluate 18 LLMs in two realistic settings: (i) Direct Comparison (1v1) and (ii) Score&Shortlist (top-scoring rate), each with and without rationale prompting. Even without explicit identifiers, models recover demographic attributes with high F1 and exhibit systematic disparities, with models favouring markers associated with Chinese and Caucasian males. Ablations show language markers suffice for ethnicity inference, whereas gender relies on hobbies and activities. Furthermore, prompting for explanations tends to amplify bias. Our findings suggest that seemingly innocuous markers surviving anonymisation can materially skew automated hiring outcomes.