Search papers, labs, and topics across Lattice.
SOMA-SQL addresses the challenge of multi-source ambiguity in natural language to SQL translation by employing a novel approach that combines synthetic query logs with ambiguity-driven probing. This method autonomously resolves ambiguities in user questions and database schemas, significantly enhancing the accuracy of SQL generation without requiring human intervention. Experiments reveal that SOMA-SQL achieves an average execution accuracy improvement of 13.0% over existing state-of-the-art methods, particularly excelling with ambiguous queries, where it shows gains of up to 16.7%.
Ambiguity in natural language queries can be resolved autonomously, boosting SQL execution accuracy by over 13% without human intervention.
Natural language interfaces to databases aim to translate user questions into executable SQL, yet remain brittle in real-world settings where questions are underspecified and schemas are large and ambiguous. Ambiguity across user questions, database schemas, and model interpretations are central failure modes in NL2SQL, leading to misaligned intent, incorrect schema grounding, and erroneous SQL generation. Existing approaches rely on human clarification or treat ambiguity as a schema representation problem, but these do not scale nor resolve ambiguity autonomously. We propose SOMA-SQL to automatically resolve ambiguity via targeted synthetic query log and ambiguity-driven probing. SOMA-SQL constructs synthetic query log to ground schema interpretation and guide candidate SQL generation; it then executes targeted probing queries, driven by a structured ambiguity taxonomy and candidate disagreements, to produce disambiguation evidence for final SQL selection and repair. This active approach to ambiguity discovery and resolution generalizes across unseen schemas and query distributions without human-in-the-loop. Experiments on six public benchmarks demonstrate that SOMA-SQL improves execution accuracy by 13.0% on average over state-of-the-art baselines, with gains of up to 16.7% on ambiguous questions.