Apr 2, 2026arXiv:2604.01676

GPA: Learning GUI Process Automation from Demonstrations

Zirui Zhao, J. Liew, Jun Hao Liew, Yan Yang, Wenzhuo Yang, Ziyang Luo, Doyen Sahoo, Silvio Savarese, Junnan Li

AI Summary

This paper introduces GUI Process Automation (GPA), a vision-based RPA system that uses Sequential Monte Carlo localization for robust GUI element identification, readiness calibration for deterministic action execution, and local execution for privacy. GPA achieves higher success rates and 10x faster execution speeds compared to Gemini 3 Pro (with CUA tools) on long-horizon GUI tasks. The proposed system can also serve as a GUI execution tool for other agents, allowing them to focus on reasoning and orchestration.

Key Contribution

Forget flaky RPA bots and unpredictable VLMs: GPA delivers robust, deterministic, and private GUI automation from a single demo, outperforming Gemini 3 Pro by 10x.

Abstract

GUI Process Automation (GPA) is a lightweight but general vision-based Robotic Process Automation (RPA), which enables fast and stable process replay with only a single demo. Addressing the fragility of traditional RPA and the non-deterministic risks of current vision language model-based GUI agents, GPA introduces three core benefits: (1) Robustness via Sequential Monte Carlo-based localization to handle rescaling and detection uncertainty; (2) Deterministic and Reliability safeguarded by readiness calibration; and (3) Privacy through fast, fully local execution. This approach delivers the adaptability, robustness, and security required for enterprise workflows. It can also be used as an MCP/CLI tool by other agents with coding capabilities so that the agent only reasons and orchestrates while GPA handles the GUI execution. We conducted a pilot experiment to compare GPA with Gemini 3 Pro (with CUA tools) and found that GPA achieves higher success rate with 10 times faster execution speed in finishing long-horizon GUI tasks.

Computer Vision Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GPA: Learning GUI Process Automation from Demonstrations

Related Papers