FoshanApr 23, 2026arXiv:2604.21414

SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Qiang Gao, Zhenping Li, Anqi Zhuo, Yingxiao Zhao, Weibo Geng, Xiaosong Li

AI Summary

The paper introduces SemanticAgent, a framework for text-to-SQL data synthesis that focuses on semantic validity, unlike existing methods that primarily rely on executability. SemanticAgent employs a three-stage protocol involving semantic analysis, stepwise synthesis, and diagnostic refinement to generate higher-quality synthetic data. Experiments show that fine-tuning on data synthesized by SemanticAgent leads to improved performance, particularly on semantically challenging benchmarks, compared to data from prior synthesis methods.

Key Contribution

Stop generating text-to-SQL training data that *runs* but is semantically wrong: this new framework finally aligns synthesis with database semantics.

Abstract

Existing text-to-SQL synthesis pipelines still conflate executability with semantic validity: syntactic checks and execution-based validation can retain queries that execute successfully while violating database semantics. To address these limitations, we propose SemanticAgent, a semantic-aware synthesis framework. SemanticAgent organizes synthesis around three specialized modules: an analyzer, a synthesizer, and a verifier. Through a three-stage protocol of semantic analysis, stepwise synthesis, and diagnostic refinement, SemanticAgent transforms execution-based validation alone into a traceable reasoning process. Our framework generates synthetic data that consistently outperforms prior synthesis methods under semantic-quality evaluation, leading to stronger downstream fine-tuning performance, especially on semantically demanding benchmarks.

Code Generation & Program Synthesis Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References52

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Related Papers