Search papers, labs, and topics across Lattice.
The paper introduces IdioLink, a new retrieval benchmark to evaluate language models' ability to link idiomatic expressions to their literal or paraphrased meanings. The benchmark comprises 10,700 documents and 2,140 queries across 107 idioms, with annotations highlighting spans conveying core meaning. Experiments with strong embedding models (BGE, E5, Contriever, Qwen) reveal that they struggle to retrieve semantically equivalent meanings across idiomatic and literal expressions, indicating a reliance on surface-level cues.
Current language models can't grasp the meaning of "break a leg" if you ask them to retrieve documents about "wishing someone good luck," revealing a surprising lack of semantic abstraction.
Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.