CASSchool of Information EngineeringApr 23, 2026arXiv:2604.21238

Unlocking the Power of Large Language Models for Multi-table Entity Matching

Yingkai Tang, Taoyu Su, Wenyuan Zhang, Xiaoyan Guo, Tingwen Liu

AI Summary

This paper introduces LLM4MEM, a novel framework leveraging LLMs to improve multi-table entity matching (MEM) by addressing semantic inconsistencies and efficiency challenges. LLM4MEM incorporates a multi-style prompt-enhanced LLM attribute coordination module, a transitive consensus embedding matching module, and a density-aware pruning module. Experiments on six MEM datasets demonstrate that LLM4MEM achieves an average F1 improvement of 5.1% compared to baseline models.

Key Contribution

LLMs can significantly boost multi-table entity matching by cleverly coordinating attributes, embedding entities, and pruning noise.

Abstract

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and pre-matching issues. Finally, to address the issue of noisy entities during the matching process, we introduce a density-aware pruning module to optimize the quality of multi-table entity matching. We conducted extensive experiments on 6 MEM datasets, and the results show that our model improves by an average of 5.1% in F1 compared with the baseline model. Our code is available at https://github.com/Ymeki/LLM4MEM.

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueNatural Language Processing and Chinese Computing

Related Papers

Finding related papers...

Search

Unlocking the Power of Large Language Models for Multi-table Entity Matching

Related Papers