Dec 14, 2025

AndroidWMSearch: Mobile Agents Tree Search with World Model

AI Summary

The paper introduces AndroidWMSearch, a tree search framework for mobile agents on Android that uses a learned world model to simulate the environment and evaluate actions before execution, addressing the challenge of irreversible operations on real devices. They train specialized LLMs as world models using a scalable data synthesis pipeline. Experiments on AndroidWorld benchmarks show that AndroidWMSearch outperforms the T3A agent by 4.7% and achieves a 3.0% performance gain over GPT-4o when using a dedicated Android-trained world model.

Key Contribution

Training a specialized LLM as a world model for Android environments yields a 3% performance boost over GPT-4o in mobile agent tree search.

Abstract

Mobile agents powered by large language models (LLMs) have demonstrated remarkable potential in automating operations on mobile devices. Recent studies have demonstrated that incorporating tree search methods and increasing testtime computation can enhance an agent's multi-step reasoning and planning capabilities. However, unlike simulated sandbox environments, Android is a dynamic environment with many irreversible operations, making tree search backtracking less feasible on the Android platform. To address this challenge, we propose AndroidWMSearch, a novel agent tree search framework that leverages a world model to emulate the Android environment. This framework allows the agent to evaluate and rank candidate actions through simulation before actual execution. We systematically explore this paradigm by: (1) Proposing a model-based Android tree search framework, AndroidWMSearch, in which LLMs are utilized both as world models and value functions. (2) Training specialized LLMs to act as world models, utilizing a scalable data synthesis pipeline for the training process. On the AndroidWorld benchmarks, our AndroidWMSearch surpasses the T3A agent by 4.7%, underscoring the effectiveness of our proposed framework. Moreover, utilizing our AndroidWM-7B, which is specifically trained for Android environments, as the world model results in a 3.0% performance gain compared to employing GPT-4o. These findings highlight the importance and efficacy of training a dedicated world model tailored for mobile agents.

Reasoning & Chain-of-Thought Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References39

Year2025

VenueInternational Conference on Parallel and Distributed Systems

Related Papers

Finding related papers...