SamsungWarsawJun 1, 2026arXiv:2606.02184

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

Michał Brzozowski, Neo Christopher Chung

AI Summary

This study uncovers the phenomenon of "ghost authors," fictional names generated by large language models (LLMs) that appear across various AI-generated documents, including academic publications. It reveals that LLMs produce correlated ensembles of names that exceed random chance, with specific patterns tied to different model families and versions. The research highlights the alarming scale of this issue, identifying over 1,600 ghost-authored records in a reputable repository, raising concerns about the integrity of academic publishing and the potential for misinformation.

Key Contribution

LLMs are not just generating random names; they create persistent, correlated character ensembles that are infiltrating academic publishing and could undermine scholarly integrity.

Abstract

These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.

Constitutional AI & AI Ethics Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

Related Papers