Mar 3, 2026arXiv:2603.02951

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

Zhenquan Yao, Zitong Huang, Yihan Zeng, Jianhua Han, Chun-Mei Feng, Jianwei Ma, Wangmeng Zuo

AI Summary

This paper addresses the challenge of continual learning for GUI agents, where frequent GUI updates require adaptation without forgetting previously learned tasks. The authors propose a Continual GUI Learning (CGL) framework that dynamically balances Supervised Fine-Tuning (SFT) for adaptation efficiency and Reinforcement Learning (RL) for skill retention. CGL incorporates an SFT proportion adjustment mechanism guided by policy entropy and a gradient surgery strategy to mitigate gradient interference, achieving superior performance on a newly introduced AndroidControl-CL benchmark.

Key Contribution

RL's inherent resilience to catastrophic forgetting can be harnessed to improve continual learning in GUI agents, outperforming SFT alone.

Abstract

Graphical User Interface (GUI) Agents, benefiting from recent advances in multimodal large language models (MLLM), have achieved significant development. However, due to the frequent updates of GUI applications, adapting to new tasks without forgetting old tasks in GUI continual learning remains an open problem. In this work, we reveal that while Supervised Fine-Tuning (SFT) facilitates fast adaptation, it often triggers knowledge overwriting, whereas Reinforcement Learning (RL) demonstrates an inherent resilience that shields prior interaction logic from erasure. Based on this insight, we propose a \textbf{C}ontinual \textbf{G}UI \textbf{L}earning (CGL) framework that dynamically balances adaptation efficiency and skill retention by enhancing the synergy between SFT and RL. Specifically, we introduce an SFT proportion adjustment mechanism guided by policy entropy to dynamically control the weight allocation between the SFT and RL training phases. To resolve explicit gradient interference, we further develop a specialized gradient surgery strategy. By projecting exploratory SFT gradients onto GRPO-based anchor gradients, our method explicitly clips the components of SFT gradients that conflict with GRPO. On top of that, we establish an AndroidControl-CL benchmark, which divides GUI applications into distinct task groups to effectively simulate and evaluate the performance of continual GUI learning. Experimental results demonstrate the effectiveness of our proposed CGL framework across continual learning scenarios. The benchmark, code, and model will be made publicly available.

Multimodal Models RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

Related Papers