Mar 15, 2026arXiv:2603.14400

Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains

AI Summary

This paper extends the minimal pairs paradigm by using surprisal curves and entropy to evaluate language models on ordinal-scaled classification and scoring tasks across diverse domains. Instead of relying on text generation, the approach measures the "surprise" (negative log probability) assigned to each position on rating scales, providing a richer understanding of model preferences and uncertainty. Experiments across social-ecological systems, causal reasoning, figurative language, and qualitative coding demonstrate that surprisal curves yield interpretable classification signals and entropy effectively distinguishes ambiguous items.

Key Contribution

Forget binary judgments: surprisal curves reveal LLMs' nuanced preferences and uncertainties across ordinal scales, offering a more informative evaluation paradigm.

Abstract

The minimal pairs paradigm of comparing model probabilities for contrasting completions has proven useful for evaluating linguistic knowledge in language models, yet its application has largely been confined to binary grammaticality judgments over syntactic phenomena. Additionally, standard prompting-based evaluation requires expensive text generation, may elicit post-hoc rationalizations rather than model judgments, and discards information about model uncertainty. We address both limitations by extending surprisal-based evaluation from binary grammaticality contrasts to ordinal-scaled classification and scoring tasks across multiple domains. Rather than asking models to generate answers, we measure the information-theoretic "surprise" (negative log probability) they assign to each position on rating scales (e.g., 1-5 or 1-9), yielding full surprisal curves that reveal both the model's preferred response and its uncertainty via entropy. We explore this framework across four domains: social-ecological-technological systems classification, causal statement identification (binary and scaled), figurative language detection, and deductive qualitative coding. Across these domains, surprisal curves produce interpretable classification signals with clear minima near expected ordinal scale positions, and entropy over the completion tended to distinguish genuinely ambiguous items from easier items.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains

Related Papers