Search papers, labs, and topics across Lattice.
This paper introduces Skill-Guided Continuation Distillation (SGCD), an innovative framework designed to enhance GUI agent performance by addressing the supervision gap encountered in off-trajectory states during closed-loop execution. By iteratively guiding the policy through realistic off-trajectory states and leveraging skill extraction from both successful and failed rollouts, SGCD effectively mixes these continuations with expert trajectories to provide necessary supervision. The approach significantly boosts the success rate of three baseline models on the OSWorld-Verified benchmark from the low-30% range to over 50%, showcasing its effectiveness and generalizability across different scenarios.
Closing the supervision gap in GUI agents boosts success rates from the low-30% range to over 50% through innovative skill-guided learning.
Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.