Mar 17, 2026arXiv:2603.16158

Execution-Grounded Credit Assignment for GRPO in Code Generation

Abhijit Kumar, Natalya Kumar, Shikhar Gupta

AI Summary

This paper introduces Execution-Grounded Credit Assignment (EGCA), a technique to improve GRPO-style reinforcement learning for code generation by localizing reward signals based on execution traces. EGCA compares the execution of a candidate program with a reference solution to identify the earliest semantic divergence, assigning credit only to the corresponding token span. Experiments on HumanEval and MBPP demonstrate that EGCA improves pass@1 scores by 3.1% and 1.5% respectively, compared to standard GRPO, with minimal overhead.

Key Contribution

Pinpointing the exact line of code causing a test failure boosts code generation performance by 3%, without needing a critic or extra training.

Abstract

Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single outcome signal is spread uniformly across long programs even when failure stems from a localized semantic error. We propose Execution-Grounded Credit Assignment (EGCA), which localizes GRPO updates using execution traces. For programs that satisfy algorithmic constraints but fail tests, EGCA executes the candidate and a canonical reference solution (curated once offline; used for analysis, not supervision) under identical instrumentation, identifies the earliest semantic divergence, and assigns advantage only to the corresponding token span while masking downstream tokens. EGCA is a drop-in modification requiring no critic, auxiliary loss, or learned verifier, yielding 82.1% pass@1 on HumanEval (+3.1 over GRPO) and 68.9% on MBPP (+1.5) with 18% wall-clock overhead.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References15

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Execution-Grounded Credit Assignment for GRPO in Code Generation

Related Papers