Mar 16, 2026arXiv:2603.15957

GASP: Guided Asymmetric Self-Play For Coding LLMs

Swadesh Jana, Cansu Sancaktar, Tom'avs Danivs, Georg Martius, A. Orvieto, Pavel Kolev

AI Summary

Guided Asymmetric Self-Play (GASP) is introduced to address the limitations of unguided asymmetric self-play in training coding LLMs by incorporating real-data goalpost questions as grounding. GASP guides the teacher model to generate easier and harder variants of these challenging questions, creating a curriculum that gradually bridges the gap to the goalpost. Experiments on LiveCodeBench demonstrate that GASP improves pass@20 by 2.5% over unguided asymmetric self-play and enables the model to solve previously unreachable hard goalpost questions.

Key Contribution

By strategically guiding self-play with challenging real-world examples, GASP unlocks a 2.5% performance boost in coding LLMs and conquers previously unsolvable problems.

Abstract

Asymmetric self-play has emerged as a promising paradigm for post-training large language models, where a teacher continually generates questions for a student to solve at the edge of the student's learnability. Although these methods promise open-ended data generation bootstrapped from no human data, they suffer from one major problem: not all problems that are hard to solve are interesting or informative to improve the overall capabilities of the model. Current asymmetric self-play methods are goal-agnostic with no real grounding. We propose Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions that are identified to pose a hard exploration challenge to the model. During self-play, the teacher first generates an easier variant of a hard question, and then a harder variant of that easier question, with the goal of gradually closing the gap to the goalpost throughout training. Doing so, we improve pass@20 on LiveCodeBench (LCB) by 2.5% over unguided asymmetric self-play, and through the curriculum constructed by the teacher, we manage to solve hard goalpost questions that remain out of reach for all baselines.

Code Generation & Program Synthesis Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GASP: Guided Asymmetric Self-Play For Coding LLMs

Related Papers