Search papers, labs, and topics across Lattice.
This paper introduces Template Constrained Decoding (TeCoD), a system that leverages recurring query patterns in labeled Text-to-SQL workloads to improve accuracy and reduce latency. TeCoD uses a fine-tuned NLI model for template selection and grammar-constrained decoding to enforce the selected template during SQL generation. Experiments show that TeCoD achieves up to 36% higher execution accuracy than in-context learning and 2.2x lower latency on matched queries.
Text-to-SQL models can get a 36% accuracy boost and run 2.2x faster by exploiting the predictable patterns in real-world query workloads.
Large language models (LLMs) have revolutionized Text-to-SQL generation, allowing users to query structured data using natural language with growing ease. Yet, real-world deployment remains challenging, especially in complex or unseen schemas, due to inconsistent accuracy and the risk of generating invalid SQL. We introduce Template Constrained Decoding (TeCoD), a system that addresses these limitations by harnessing the recurrence of query patterns in labeled workloads. TeCoD converts historical NL-SQL pairs into reusable templates and introduces a robust template selection module that uses a fine-tuned natural language inference model to match or reject queries efficiently. Once the template is selected, TeCoD enforces it during SQL generation through grammar-constrained decoding, implemented via a novel partitioned strategy that ensures both syntactic validity and efficiency. Together, these components yield up to 36% higher execution accuracy than in-context learning (ICL) and 2.2脳 lower latency on matched queries.