Apr 8, 2026arXiv:2604.06820

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

Zonghuan Xu, Xiang Zheng, Yutao Wu, Xingjun Ma

AI Summary

This paper audits the validity of using LLM judges as proxies for human readers when evaluating LLM-generated disinformation. By comparing LLM judge outputs to 2,043 human ratings on 290 articles, the study reveals significant discrepancies in overall scoring, item-level ranking, and reliance on textual signals. The key finding is that LLM judges, while internally consistent, differ substantially from human readers, overemphasizing logical rigor and underemphasizing emotional intensity.

Key Contribution

LLM judges of disinformation risk are internally consistent, but consistently misaligned with actual human readers, raising serious questions about their validity as evaluation proxies.

Abstract

Large language models (LLMs) can generate persuasive narratives at scale, raising concerns about their potential use in disinformation campaigns. Assessing this risk ultimately requires understanding how readers receive such content. In practice, however, LLM judges are increasingly used as a low-cost substitute for direct human evaluation, even though whether they faithfully track reader responses remains unclear. We recast evaluation in this setting as a proxy-validity problem and audit LLM judges against human reader responses. Using 290 aligned articles, 2,043 paired human ratings, and outputs from eight frontier judges, we examine judge--human alignment in terms of overall scoring, item-level ordering, and signal dependence. We find persistent judge--human gaps throughout. Relative to humans, judges are typically harsher, recover item-level human rankings only weakly, and rely on different textual signals, placing more weight on logical rigour while penalizing emotional intensity more strongly. At the same time, judges agree far more with one another than with human readers. These results suggest that LLM judges form a coherent evaluative group that is much more aligned internally than it is with human readers, indicating that internal agreement is not evidence of validity as a proxy for reader response.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

Related Papers