Feb 25, 2026arXiv:2602.22404

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

Aishwarya Verma, Laud Ammah, Olivia Nercy Ndlovu Lucas, Andrew Zaldivar, Vinodkumar Prabhakaran, Sunipa Dev

AI Summary

The paper addresses the lack of stereotype resources for evaluating generative AI model safety in Sub-Saharan Africa by creating a multilingual stereotype dataset covering Ghana, Kenya, Nigeria, and South Africa. They employ community-engaged methods, including telephonic surveys moderated in native languages, to capture socioculturally-situated stereotypes. The resulting SAFARI dataset contains 3,534 stereotypes in English and 3,206 stereotypes in 15 native languages, balanced across diverse ethnic and demographic backgrounds.

Key Contribution

AI safety evaluations get a much-needed dose of Sub-Saharan African perspectives with the release of SAFARI, a stereotype dataset built using community-engaged methods across 15 native languages.

Abstract

Stereotype repositories are critical to assess generative AI model safety, but currently lack adequate global coverage. It is imperative to prioritize targeted expansion, strategically addressing existing deficits, over merely increasing data volume. This work introduces a multilingual stereotype resource covering four sub-Saharan African countries that are severely underrepresented in NLP resources: Ghana, Kenya, Nigeria, and South Africa. By utilizing socioculturally-situated, community-engaged methods, including telephonic surveys moderated in native languages, we establish a reproducible methodology that is sensitive to the region's complex linguistic diversity and traditional orality. By deliberately balancing the sample across diverse ethnic and demographic backgrounds, we ensure broad coverage, resulting in a dataset of 3,534 stereotypes in English and 3,206 stereotypes across 15 native languages.

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

Related Papers