GeorgetownTohokuUCSDUniversity of Artificial IntelligenceUTokyoApr 20, 2026arXiv:2604.18563

Dual Alignment Between Language Model Layers and Human Sentence Processing

Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox

AI Summary

This study investigates how different layers of large language models (LLMs) align with human cognitive effort during sentence processing, particularly in syntactically ambiguous constructions. The findings reveal that while later layers of LLMs better estimate cognitive effort compared to earlier layers, they still fall short of accurately reflecting human processing data. This dual alignment highlights the distinct processing strategies employed by humans and LLMs, suggesting that naturalistic reading relies on weaker predictions while complex syntactic tasks necessitate more nuanced representations.

Key Contribution

Later layers of LLMs capture cognitive effort in syntactically challenging sentences better than earlier layers, but still miss the mark compared to human processing.

Abstract

A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal from early layers of large language models (LLMs). This raises the question of whether such advantages of internal layers extend to more syntactically challenging constructions, where surprisal has been reported to underestimate human cognitive effort. In this paper, we begin by exploring internal layers that better estimate human cognitive effort observed in syntactic ambiguity processing in English. Our experiments show that, in contrast to naturalistic reading, later layers better estimate such a cognitive effort, but still underestimate the human data. This dual alignment sheds light on different modes of sentence processing in humans and LMs: naturalistic reading employs a somewhat weak prediction akin to earlier layers of LMs, while syntactically challenging processing requires more fully-contextualized representations, better modeled by later layers of LMs. Motivated by these findings, we also explore several probability-update measures using shallow and deep layers of LMs, showing a complementary advantage to single-layer's surprisal in reading time modeling.

Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Dual Alignment Between Language Model Layers and Human Sentence Processing

Related Papers