CAUApr 6, 2025

Enhancing Cross-Domain Slot Filling with Joint LLM Data Generation and Data Curation

Peijie Huang, Weizhen Li, Yuhong Xu, Junbao Huang

AI Summary

This paper addresses the challenge of cross-domain slot filling in spoken language understanding by using LLMs to generate synthetic data for data-scarce target domains. They introduce a two-stage data generation strategy using LLMs to synthesize samples and a data curation mechanism based on confidence and uncertainty to filter out low-quality samples. Experiments demonstrate the effectiveness and generality of the proposed approach in enhancing cross-domain slot filling performance.

Key Contribution

LLMs can fill the cross-domain slot filling gap, but only if you carefully curate their synthetic data using confidence and uncertainty metrics.

Abstract

In real-world scenarios, due to data scarcity, cross-domain slot filling in spoken language understanding remains a significant challenge. Previous works focus on supplementing sequence labeling models with slot meta-information or metric learning. They have poor generalization capabilities lacking specific domain knowledge. To enhance generalization, recent studies introduce implicit general knowledge to enhance the performance of slots lacking domain-specific knowledge by further pretraining or larger-parameter generative models. However, this knowledge is domain-agnostic and difficult to provide comprehensive knowledge for target domain. Therefore, we propose a two-stage data generation strategy, utilizing powerful LLMs to synthesize samples to introduce knowledge for each slot of the data-scarce target domain. More importantly, we employ a data curation mechanism based on confidence and uncertainty to identify and filter out low-quality samples to obtain a high-quality synthetic dataset. Extensive experimental results demonstrate the effectiveness and generality of our approach.

Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References25

Year2025

VenueIEEE International Conference on Acoustics, Speech, and Signal Processing

Related Papers

Finding related papers...

Search

Enhancing Cross-Domain Slot Filling with Joint LLM Data Generation and Data Curation

Related Papers