May 6, 2026arXiv:2605.04719

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

Yaxun Dai, Yingqi Gao, Xuemei Dong, Meng Chu, Pingfu Chao

AI Summary

This paper tackles the credit assignment problem in tool-integrated Text-to-SQL by introducing FineStep, a framework that provides step-level rewards and advantages. FineStep uses independent process rewards to combat signal sparsity and a step-level credit assignment mechanism to evaluate each reasoning step. Experiments on the BIRD benchmark demonstrate that FineStep achieves state-of-the-art performance, improving execution accuracy by 3.25% over GRPO at the 4B scale and reducing redundant tool interactions.

Key Contribution

Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.

Abstract

Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References55

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

Related Papers