Mar 31, 2026arXiv:2603.29454

Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods

AI Summary

This paper investigates whether GPT-4o can generate convincing authorial impersonations that evade authorship verification (AV) systems. They generated impersonation texts across emails, text messages, and social media posts using four prompting conditions and evaluated them against both non-neural and neural AV methods. The results show that LLM-generated texts failed to bypass established AV systems, and some methods even achieved higher accuracy when rejecting impersonations compared to genuine negative samples due to the higher lexical diversity and entropy in LLM outputs.

Key Contribution

LLM-generated authorial impersonations, despite their sophistication, are surprisingly detectable by existing authorship verification methods, even outperforming on some genuine negative samples.

Abstract

Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to impersonate another's writing. This study investigates whether prompted LLMs can generate convincing authorial impersonations and whether such outputs can evade existing forensic AV systems. Using GPT-4o as the adversary model, we generated impersonation texts under four prompting conditions across three genres: emails, text messages, and social media posts. We then evaluated these outputs against both non-neural AV methods (n-gram tracing, Ranking-Based Impostors Method, LambdaG) and neural approaches (AdHominem, LUAR, STAR) within a likelihood-ratio framework. Results show that LLM-generated texts failed to sufficiently replicate authorial individuality to bypass established AV systems. We also observed that some methods achieved even higher accuracy when rejecting impersonation texts compared to genuine negative samples. Overall, these findings indicate that, despite the accessibility of LLMs, current AV systems remain robust against entry-level impersonation attempts across multiple genres. Furthermore, we demonstrate that this counter-intuitive resilience stems, at least in part, from the higher lexical diversity and entropy inherent in LLM-generated texts.

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods

Related Papers