MBZUAIApr 8, 2026arXiv:2604.06736

SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

Yixi Zhou, Zhiqiao Guo, Haipeng Zhang, Haipeng Zhang, Preslav Nakov, Zhuohan Xie

AI Summary

This paper introduces SQLStructEval, a framework to evaluate the structural reliability of LLM-generated SQL queries using canonical abstract syntax tree (AST) representations. Experiments on the Spider benchmark reveal that LLMs often generate structurally diverse SQL queries for the same input, even with correct execution, and are sensitive to surface-level input variations. The authors demonstrate that a compile-style pipeline for structured SQL generation improves both execution accuracy and structural consistency.

Key Contribution

LLMs may nail the Text-to-SQL execution accuracy, but SQLStructEval reveals they're often generating wildly different query structures for the same question, raising serious reliability concerns.

Abstract

Despite strong performance on Text-to-SQL benchmarks, it remains unclear whether LLM-generated SQL programs are structurally reliable. In this work, we investigate the structural behavior of LLM-generated SQL queries and introduce SQLStructEval, a framework for analyzing program structures through canonical abstract syntax tree (AST) representations. Our experiments on the Spider benchmark show that modern LLMs often produce structurally diverse queries for the same input, even when execution results are correct, and that such variance is frequently triggered by surface-level input changes such as paraphrases or schema presentation. We further show that generating queries in a structured space via a compile-style pipeline can improve both execution accuracy and structural consistency. These findings suggest that structural reliability is a critical yet overlooked dimension for evaluating LLM-based program generation systems. Our code is available at https://anonymous.4open.science/r/StructEval-2435.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

Related Papers