Search papers, labs, and topics across Lattice.
This study uncovers the phenomenon of "ghost authors," fictional names generated by large language models (LLMs) that appear across various AI-generated documents, including academic publications. It reveals that LLMs produce correlated ensembles of names that exceed random chance, with specific patterns tied to different model families and versions. The research highlights the alarming scale of this issue, identifying over 1,600 ghost-authored records in a reputable repository, raising concerns about the integrity of academic publishing and the potential for misinformation.
LLMs are not just generating random names; they create persistent, correlated character ensembles that are infiltrating academic publishing and could undermine scholarly integrity.
These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.