Apr 13, 2026arXiv:2604.11784

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

AI Summary

ClawGUI is introduced as an open-source framework designed to streamline the training, evaluation, and deployment of GUI agents, addressing limitations in existing infrastructure. It offers ClawGUI-RL for RL training with virtual and physical device support, ClawGUI-Eval for standardized evaluation across benchmarks, and ClawGUI-Agent for deploying agents on mobile platforms with personalized memory. End-to-end training with ClawGUI-2B demonstrates a 17.1% success rate on MobileWorld GUI-Only, surpassing the MAI-UI-2B baseline.

Key Contribution

Finally, a unified open-source framework lets you train, evaluate, and deploy GUI agents across real devices and chat platforms, closing the gap between research and real-world application.

Abstract

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. \textbf{ClawGUI-Eval} enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-Agent} brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, \textbf{ClawGUI-2B} achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.

Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References59

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Related Papers