Tsinghua AINTUStepFunUniversity of Science and TechnologyJun 17, 2026arXiv:2606.18890

Skill-Guided Continuation Distillation for GUI Agents

Zhimin Fan, Hongwei Yu, Yeqing Shen, Haolong Yan, Guozhen Peng, Tianhao Peng, Yudong Zhang, Xiaowen Zhang, Kaijun Tan, Zheng Ge, Xiangyu Zhang, Daxin Jiang

AI Summary

This paper introduces Skill-Guided Continuation Distillation (SGCD), an innovative framework designed to enhance GUI agent performance by addressing the supervision gap encountered in off-trajectory states during closed-loop execution. By iteratively guiding the policy through realistic off-trajectory states and leveraging skill extraction from both successful and failed rollouts, SGCD effectively mixes these continuations with expert trajectories to provide necessary supervision. The approach significantly boosts the success rate of three baseline models on the OSWorld-Verified benchmark from the low-30% range to over 50%, showcasing its effectiveness and generalizability across different scenarios.

Key Contribution

Closing the supervision gap in GUI agents boosts success rates from the low-30% range to over 50% through innovative skill-guided learning.

Abstract

Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.

Scalable Oversight & Alignment Theory Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Skill-Guided Continuation Distillation for GUI Agents

Related Papers