Tsinghua AIBaiduECNUGIST GuangdongFeb 15, 2026arXiv:2602.14060

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Lingyong Yan

AI Summary

The paper introduces LM-Lexicon, a definition modeling approach that trains specialized language models as semantic domain experts and merges them using a sparse mixture-of-experts architecture. This approach decomposes the definition modeling task into specialized semantic domains via data clustering and semantic expert learning. LM-Lexicon achieves a 7% BLEU score improvement over prior state-of-the-art models across five benchmarks, demonstrating the effectiveness of fine-grained expert specialization and semantic-aware routing.

Key Contribution

Forget monolithic models: a mixture-of-experts approach using clustered semantic domains boosts definition modeling by 7% BLEU, proving that specialization wins.

Abstract

We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Related Papers