Mar 31, 2026arXiv:2603.29396

Is my model perplexed for the right reason? Contrasting LLMs'Benchmark Behavior with Token-Level Perplexity

Zoë Prins, Samuele Punzo, Frank Wildenburg, Giovanni Ciná, Sandro Pezzelle

AI Summary

This paper introduces a token-level perplexity framework to analyze whether LLMs rely on linguistically relevant cues, contrasting benchmark performance with underlying mechanisms. The method compares perplexity distributions over minimal sentence pairs differing in pivotal tokens to test specific linguistic hypotheses. Experiments on controlled benchmarks reveal that LLMs' perplexity shifts are not fully explained by linguistically important tokens, indicating a reliance on unexpected heuristics.

Key Contribution

LLMs ace linguistic benchmarks, but a token-level perplexity analysis reveals they're often relying on the wrong cues.

Abstract

Standard evaluations of Large language models (LLMs) focus on task performance, offering limited insight into whether correct behavior reflects appropriate underlying mechanisms and risking confirmation bias. We introduce a simple, principled interpretability framework based on token-level perplexity to test whether models rely on linguistically relevant cues. By comparing perplexity distributions over minimal sentence pairs differing in one or a few `pivotal'tokens, our method enables precise, hypothesis-driven analysis without relying on unstable feature-attribution techniques. Experiments on controlled linguistic benchmarks with several open-weight LLMs show that, while linguistically important tokens influence model behavior, they never fully explain perplexity shifts, revealing that models rely on heuristics other than the expected linguistic ones.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Is my model perplexed for the right reason? Contrasting LLMs'Benchmark Behavior with Token-Level Perplexity

Related Papers