Mar 10, 2026arXiv:2603.09884

Benchmarking Political Persuasion Risks Across Frontier Large Language Models

AI Summary

This paper benchmarks the political persuasion capabilities of seven frontier LLMs (Anthropic, OpenAI, Google, xAI) across bipartisan issues, finding they outperform standard campaign advertisements. Claude models were the most persuasive, while Grok was the least, and the effectiveness of information-based prompts varied significantly across models (boosting Claude and Grok, but reducing GPT's persuasiveness). The authors introduce an LLM-assisted conversation analysis to identify persuasive strategies used by each model.

Key Contribution

Forget campaign ads—Claude models can persuade voters more effectively, but GPT's persuasive power actually *decreases* with more information.

Abstract

Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier models warrants further study. In two survey experiments (N=19,145) across bipartisan issues and stances, we evaluate seven state-of-the-art LLMs developed by Anthropic, OpenAI, Google, and xAI. We find that LLMs outperform standard campaign advertisements, with heterogeneity in performance across models. Specifically, Claude models exhibit the highest persuasiveness, while Grok exhibits the lowest. The results are robust across issues and stances. Moreover, in contrast to the findings in Hackenburg et al. (2025b) and Lin et al. (2025) that information-based prompts boost persuasiveness, we find that the effectiveness of information-based prompts is model-dependent: they increase the persuasiveness of Claude and Grok while substantially reducing that of GPT. We introduce a data-driven and strategy-agnostic LLM-assisted conversation analysis approach to identify and assess underlying persuasive strategies. Our work benchmarks the persuasive risks of frontier models and provides a framework for cross-model comparative risk assessment.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Benchmarking Political Persuasion Risks Across Frontier Large Language Models

Related Papers