Data Science InstituteEfi Arazi School of Computer ScienceVrije Universiteit AmsterdamMay 21, 2026arXiv:2605.22247

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

Kai Golan Hashiloni, Daniel Fadlon, Lior Livyatan, Ofri Hefetz, Jiahuan Pei, Kfir Bar

AI Summary

The paper introduces IdioLink, a new retrieval benchmark to evaluate language models' ability to link idiomatic expressions to their literal or paraphrased meanings. The benchmark comprises 10,700 documents and 2,140 queries across 107 idioms, with annotations highlighting spans conveying core meaning. Experiments with strong embedding models (BGE, E5, Contriever, Qwen) reveal that they struggle to retrieve semantically equivalent meanings across idiomatic and literal expressions, indicating a reliance on surface-level cues.

Key Contribution

Current language models can't grasp the meaning of "break a leg" if you ask them to retrieve documents about "wishing someone good luck," revealing a surprising lack of semantic abstraction.

Abstract

Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

Related Papers