SJTUFeb 24, 2026arXiv:2604.09574

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu, Yong Yu, Weinan Zhang, Jianghao Lin

AI Summary

This paper introduces "Turing Test on Screen," a benchmark for evaluating the humanization of mobile GUI agents by framing the interaction as a MinMax optimization problem between an agent and a detector. They collected a high-fidelity dataset of mobile touch dynamics and found that vanilla LMM-based agents are easily detectable due to unnatural kinematics. The authors then propose and evaluate methods, including heuristic noise and data-driven behavioral matching, to improve agent imitability without sacrificing task performance.

Key Contribution

LMM-based GUI agents stick out like a sore thumb in human-centric mobile environments, but simple techniques can make them blend in without sacrificing utility.

Abstract

The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the ``Turing Test on Screen,''formally modeling the interaction as a MinMax optimization problem between a detector and an agent aiming to minimize behavioral divergence. We then collect a new high-fidelity dataset of mobile touch dynamics, and conduct our analysis that vanilla LMM-based agents are easily detectable due to unnatural kinematics. Consequently, we establish the Agent Humanization Benchmark (AHB) and detection metrics to quantify the trade-off between imitability and utility. Finally, we propose methods ranging from heuristic noise to data-driven behavioral matching, demonstrating that agents can achieve high imitability theoretically and empirically without sacrificing performance. This work shifts the paradigm from whether an agent can perform a task to how it performs it within a human-centric ecosystem, laying the groundwork for seamless coexistence in adversarial digital environments.

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References59

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Related Papers