Search papers, labs, and topics across Lattice.
This paper tackles the credit assignment problem in tool-integrated Text-to-SQL by introducing FineStep, a framework that provides step-level rewards and advantages. FineStep uses independent process rewards to combat signal sparsity and a step-level credit assignment mechanism to evaluate each reasoning step. Experiments on the BIRD benchmark demonstrate that FineStep achieves state-of-the-art performance, improving execution accuracy by 3.25% over GRPO at the 4B scale and reducing redundant tool interactions.
Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.
Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.