Tencent AIZJUMay 28, 2026arXiv:2605.29670

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang, H. Feng, Xuan Yi, Chaoyi Hu, Defeng Xie, Chen Hou, Danqing Huang, Wei Chen, Yingcai Wu, Dazhen Deng

AI Summary

The paper introduces EviLink, a novel schema linking approach for Text-to-SQL that reframes the task as uncertainty-aware schema-need inference across multiple plausible SQL paths. EviLink distinguishes between required and path-dependent uncertain schema items, strategically acquiring evidence only when necessary to resolve ambiguities. Experiments on BIRD-Dev and Spider2-Snow demonstrate that EviLink improves schema completeness and relevance while reducing token costs, achieving a 90.15% field-level strict recall rate on Spider2-Snow.

Key Contribution

Text-to-SQL models can now achieve higher accuracy with fewer tokens by reasoning about multiple possible query paths and selectively gathering evidence only when uncertain about which schema elements are needed.

Abstract

Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema context from large and ambiguous databases. Existing methods often treat schema linking as deterministic selection around a single SQL path, but complex questions may admit multiple valid realizations with different schema needs. We reframe schema linking as uncertainty-aware schema-need inference over multiple plausible SQL paths, where the system distinguishes required schema items from path-dependent uncertain ones and acquires evidence only where needed. We instantiate this reframing with EviLink, which combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition. Experiments on BIRD-Dev and Spider2-Snow show that this perspective improves the balance among schema completeness, schema relevance, and token cost. On Spider2-Snow, EviLink achieves 90.15% field-level strict recall rate, uses 123.30K average tokens, and improves downstream SQL generation under a fixed generator.

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Related Papers