Tsinghua AIIndependent ResearcherFeb 25, 2026arXiv:2602.22145

When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

Satyam Kumar Navneet, Satyam Kumar Navneet, Joydeep Chandra, Joydeep Chandra

AI Summary

This paper introduces the concept of "Cultural Ghosting," the systematic erasure of linguistic markers from non-native English varieties by LLMs, and quantifies this phenomenon using the Identity Erasure Rate (IER) and Semantic Preservation Score (SPS). The authors analyzed 22,350 LLM outputs generated from culturally marked texts (Indian, Singaporean, & Nigerian English) processed by five models, revealing a mean IER of 10.26% and highlighting a "Semantic Preservation Paradox" where models maintain semantic similarity while erasing cultural markers. They also found that pragmatic markers are more vulnerable to erasure and that cultural-preservation prompts can reduce erasure.

Key Contribution

LLMs scrub away up to 20% of culturally specific language, even while preserving the core meaning, revealing a "Semantic Preservation Paradox" that threatens linguistic diversity.

Abstract

Large Language Models (LLMs) are increasingly used to ``professionalize''workplace communication, often at the cost of linguistic identity. We introduce"Cultural Ghosting", the systematic erasure of linguistic markers unique to non-native English varieties during text processing. Through analysis of 22,350 LLM outputs generated from 1,490 culturally marked texts (Indian, Singaporean,&Nigerian English) processed by five models under three prompt conditions, we quantify this phenomenon using two novel metrics: Identity Erasure Rate (IER)&Semantic Preservation Score (SPS). Across all prompts, we find an overall IER of 10.26%, with model-level variation from 3.5% to 20.5% (5.9x range). Crucially, we identify a Semantic Preservation Paradox: models maintain high semantic similarity (mean SPS = 0.748) while systematically erasing cultural markers. Pragmatic markers (politeness conventions) are 1.9x more vulnerable than lexical markers (71.5% vs. 37.1% erasure). Our experiments demonstrate that explicit cultural-preservation prompts reduce erasure by 29% without sacrificing semantic quality.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

Related Papers