Search papers, labs, and topics across Lattice.
The paper introduces SWE-Prot\'eg\'e, a post-training framework that enables small language models (SLMs) to perform better on long-horizon software engineering tasks by selectively collaborating with a strong expert model. The SLM learns to recognize stalled states and follow expert feedback, while remaining the sole decision-maker. By combining supervised fine-tuning on expert-augmented trajectories with reinforcement learning to discourage looping and unproductive collaboration, the authors achieve a 42.4% Pass@1 on SWE-bench Verified using a lightly post-trained Qwen2.5-Coder-7B-Instruct.
SLMs can leapfrog performance on complex software engineering tasks by learning *when* to ask for help from larger models, achieving a 25% gain on SWE-bench with minimal expert queries.
Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Prot\'eg\'e, a post-training framework that reframes software repair as an expert-prot\'eg\'e collaboration problem. In SWE-Prot\'eg\'e, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).