MIT CSAILMar 12, 2026arXiv:2603.12105

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Mart'inez, Gonzalo Martínez, Pedro Reviriego

AI Summary

This paper investigates the ability of Large Language Models (LLMs) to estimate sentence-level psycholinguistic norms, specifically memorability and reading times. They fine-tuned LLMs to predict these norms and found that the fine-tuned models correlate well with human-derived norms, outperforming interpretable baselines. However, zero-shot and few-shot prompting yielded inconsistent results, highlighting the limitations of using LLMs as direct proxies for human cognitive measures without fine-tuning.

Key Contribution

Fine-tuning unlocks LLMs' surprising ability to predict how memorable a sentence is and how long it takes to read, exceeding traditional methods.

Abstract

Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Related Papers