Search papers, labs, and topics across Lattice.
This paper introduces a new eye-tracking dataset of Portuguese L1 speakers of English processing idiomatic expressions, designed to quantify the cognitive effort associated with L2 idiom comprehension across CEFR proficiency levels (A1-C2). Analysis of the dataset, collected using accessible 60 Hz eye-tracking hardware, reveals a strong inverse correlation between language proficiency and regressive eye movements during idiom processing, validating its utility. The dataset is intended as a benchmark for evaluating cognitive plausibility in both human processing models and large language models.
L2 learners' struggles with idioms, captured in a new eye-tracking dataset, offer a cognitively-grounded benchmark for evaluating how well LLMs truly "understand" figurative language.
This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.