Search papers, labs, and topics across Lattice.
This paper benchmarks three transformer-based sentiment classifiers on a large corpus of Holocaust oral histories, revealing significant challenges in polarity detection within this domain. They introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output and identify areas of systematic disagreement. Results show low to moderate inter-model agreement, largely driven by discrepancies in classifying neutral sentiment, highlighting the limitations of current sentiment models in handling nuanced historical narratives.
Sentiment models often disagree on Holocaust oral histories, not on the presence of positive or negative sentiment, but on the boundary of neutrality, revealing a critical gap in their ability to handle nuanced historical narratives.
Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.