Feb 25, 2026arXiv:2602.22124

SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Patrick Tser Jern Kon, Patrick Tser Jern Kon, Archana Pradeep, Archana Pradeep, Ang Chen, Ang Chen, Alexander P. Ellis, Alexander P. Ellis, Warren Hunt, Warren Hunt, Zijian Wang, Zijian Wang, John Yang, John Yang, Samuel Thompson, Samuel Thompson

AI Summary

The paper introduces SWE-Prot\'eg\'e, a post-training framework that enables small language models (SLMs) to perform better on long-horizon software engineering tasks by selectively collaborating with a strong expert model. The SLM learns to recognize stalled states and follow expert feedback, while remaining the sole decision-maker. By combining supervised fine-tuning on expert-augmented trajectories with reinforcement learning to discourage looping and unproductive collaboration, the authors achieve a 42.4% Pass@1 on SWE-bench Verified using a lightly post-trained Qwen2.5-Coder-7B-Instruct.

Key Contribution

SLMs can leapfrog performance on complex software engineering tasks by learning *when* to ask for help from larger models, achieving a 25% gain on SWE-bench with minimal expert queries.

Abstract

Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Prot\'eg\'e, a post-training framework that reframes software repair as an expert-prot\'eg\'e collaboration problem. In SWE-Prot\'eg\'e, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Related Papers