PKUTencent AIZJUJun 4, 2026arXiv:2606.05836

ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL

Zhaorui Yang, Huawei Zheng, Sen Yang, Yuhui Zhang, Zhizhen Yu, Xuan Yi, Chen Hou, Defeng Xie, Chao Hu, Minfeng Zhu, Dazhen Deng, Haozhe Feng, Danqing Huang, Yingcai Wu, Peng Chen, Wei Chen

AI Summary

This paper introduces ProSPy, a Profiling-driven SQL-Python framework designed to enhance Text-to-SQL capabilities for enterprise-scale databases by addressing challenges such as heterogeneous schemas and dialect-specific SQL syntax. The framework operates through a structured four-stage reasoning process that includes automatic profiling, schema pruning, dialect-agnostic SQL fetching, and flexible Python-based analysis. Experimental results demonstrate that ProSPy significantly outperforms existing models on the Spider 2.0 datasets, achieving execution accuracies of 60.15% and 60.51% while maintaining robustness across SQL dialects.

Key Contribution

ProSPy achieves over 60% execution accuracy in Text-to-SQL tasks, effectively bridging the gap between SQL efficiency and Python flexibility for complex enterprise databases.

Abstract

Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-specific SQL syntax, and complex analytical questions that are difficult to solve with a single SQL query. To address these challenges, we propose ProSPy, a Profiling-driven SQL--Python agentic framework for enterprise-scale Text-to-SQL. ProSPy structures the reasoning process into four stages: it first extracts fine-grained data evidence through automatic profiling, progressively prunes large schemas into task-relevant contexts, fetches intermediate views through a dialect-agnostic SQL interface, and finally performs flexible downstream analysis with Python. This design combines the efficiency of SQL over large databases with the flexibility of Python-based analysis, while reducing reliance on unreliable metadata and improving robustness across SQL dialects. Experiments on Spider 2.0-Lite and Spider 2.0-Snow show that ProSPy consistently outperforms strong baselines with both open-source and proprietary models, achieving execution accuracies of 60.15% and 60.51% with Claude-4.5-Opus, without majority voting. Further analysis shows that ProSPy is robust to SQL dialect variations and achieves a favorable trade-off between schema recall and precision.

Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL

Related Papers