Centrum Wiskunde & Informatica (CWI)Hitotsubashi UniversityNIIUTokyoUvAApr 7, 2026arXiv:2604.05593

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Xin Sun, Sijing Qin, Isao Echizen, Abdallah El Ali, Saku Sugawara

AI Summary

This paper investigates whether LLMs, when used as automated evaluators, exhibit biases based on source labels. Through a counterfactual design, the study reveals that both humans and LLMs assign higher trust to content labeled as human-authored compared to the same content labeled as AI-generated. Analysis of LLM internal states shows that models allocate more attention to the label region than the content itself, mirroring human gaze patterns and suggesting a reliance on source labels as heuristic cues.

Key Contribution

LLMs judging content aren't as objective as we thought: they're swayed by source labels just like humans, giving "human-authored" content an unfair trust advantage.

Abstract

Large language models (LLMs) are increasingly used as automated evaluators (LLM-as-a-Judge). This work challenges its reliability by showing that trust judgments by LLMs are biased by disclosed source labels. Using a counterfactual design, we find that both humans and LLM judges assign higher trust to information labeled as human-authored than to the same content labeled as AI-generated. Eye-tracking data reveal that humans rely heavily on source labels as heuristic cues for judgments. We analyze LLM internal states during judgment. Across label conditions, models allocate denser attention to the label region than the content region, and this label dominance is stronger under Human labels than AI labels, consistent with the human gaze patterns. Besides, decision uncertainty measured by logits is higher under AI labels than Human labels. These results indicate that the source label is a salient heuristic cue for both humans and LLMs. It raises validity concerns for label-sensitive LLM-as-a-Judge evaluation, and we cautiously raise that aligning models with human preferences may propagate human heuristic reliance into models, motivating debiased evaluation and alignment.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Related Papers