CambridgeMay 27, 2026arXiv:2605.28802

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

Beiduo Chen, Pingjun Hong, Ziyun Zhang, Benjamin Roth, Anna Korhonen

AI Summary

This paper investigates whether LLMs can learn annotator-specific label-explanation behavior, leveraging human label variation (HLV) beyond simple label disagreement. They introduce cross-annotator preference optimization (CAPO), a method that contrasts a target annotator's response with other valid annotations, to fine-tune LLMs. Experiments on NLI and paraphrase judgment tasks demonstrate that CAPO outperforms prompting and standard supervised fine-tuning in capturing annotator-specific reasoning patterns, as validated by human evaluation and judge-based attribution.

Key Contribution

LLMs can learn to mimic the nuanced reasoning patterns of individual annotators by contrasting their explanations with those of others, opening the door to more personalized and scalable annotation strategies.

Abstract

Free-text explanations extend human label variation (HLV) beyond label disagreement by revealing the reasoning and preferences behind annotators' decisions. We study whether large language models (LLMs) can learn and reproduce such annotator-specific label-explanation behavior. Using two sentence-pair tasks with four annotators each -- natural language inference and paraphrase judgment -- we first analyze whether annotators exhibit stable individual patterns. We find that such patterns are weak at the single-annotation level due to strong input-content effects, but become detectable after input-content reduction and annotator-level aggregation. We then compare prompting and supervised fine-tuning (SFT) baselines and propose cross-annotator preference optimization (CAPO), which contrasts a target annotator's response with other valid but less target-specific annotations for the same input. Experiments show that prompting is limited and unstable, SFT better captures annotator-specific behavior, and CAPO further improves aggregation-aware imitation and judge-based attribution while preserving target-specific reasoning patterns under human validation. Overall, our results show that HLV can be learned as annotator-specific label-explanation behavior, suggesting a path toward scalable explanation-based annotation grounded in annotator histories rather than labels alone.

Interpretability & Mechanistic Interp Natural Language Processing RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

Related Papers