Mar 30, 2026arXiv:2603.28913

From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

AI Summary

This paper benchmarks three transformer-based sentiment classifiers on a large corpus of Holocaust oral histories, revealing significant challenges in polarity detection within this domain. They introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output and identify areas of systematic disagreement. Results show low to moderate inter-model agreement, largely driven by discrepancies in classifying neutral sentiment, highlighting the limitations of current sentiment models in handling nuanced historical narratives.

Key Contribution

Sentiment models often disagree on Holocaust oral histories, not on the presence of positive or negative sentiment, but on the boundary of neutrality, revealing a critical gap in their ability to handle nuanced historical narratives.

Abstract

Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

Related Papers