Search papers, labs, and topics across Lattice.
This paper introduces "Turing Test on Screen," a benchmark for evaluating the humanization of mobile GUI agents by framing the interaction as a MinMax optimization problem between an agent and a detector. They collected a high-fidelity dataset of mobile touch dynamics and found that vanilla LMM-based agents are easily detectable due to unnatural kinematics. The authors then propose and evaluate methods, including heuristic noise and data-driven behavioral matching, to improve agent imitability without sacrificing task performance.
LMM-based GUI agents stick out like a sore thumb in human-centric mobile environments, but simple techniques can make them blend in without sacrificing utility.
The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the ``Turing Test on Screen,''formally modeling the interaction as a MinMax optimization problem between a detector and an agent aiming to minimize behavioral divergence. We then collect a new high-fidelity dataset of mobile touch dynamics, and conduct our analysis that vanilla LMM-based agents are easily detectable due to unnatural kinematics. Consequently, we establish the Agent Humanization Benchmark (AHB) and detection metrics to quantify the trade-off between imitability and utility. Finally, we propose methods ranging from heuristic noise to data-driven behavioral matching, demonstrating that agents can achieve high imitability theoretically and empirically without sacrificing performance. This work shifts the paradigm from whether an agent can perform a task to how it performs it within a human-centric ecosystem, laying the groundwork for seamless coexistence in adversarial digital environments.