Search papers, labs, and topics across Lattice.
SecAgent, a 3B-parameter mobile GUI agent, was developed to address the limitations of existing approaches in multilingual datasets and inefficient history representation. A new human-verified Chinese mobile GUI dataset was created, containing 18k grounding samples and 121k navigation steps across 44 applications, along with a Chinese navigation benchmark. SecAgent employs a semantic context mechanism to distill history into natural language summaries, reducing computational costs while maintaining performance comparable to larger 7-8B models on both the new dataset and public benchmarks.
A 3B model can match the performance of models more than twice its size in mobile GUI automation by distilling visual history into concise natural language summaries.
Mobile Graphical User Interface (GUI) agents powered by multimodal large language models have demonstrated promising capabilities in automating complex smartphone tasks. However, existing approaches face two critical limitations: the scarcity of high-quality multilingual datasets, particularly for non-English ecosystems, and inefficient history representation methods. To address these challenges, we present SecAgent, an efficient mobile GUI agent at 3B scale. We first construct a human-verified Chinese mobile GUI dataset with 18k grounding samples and 121k navigation steps across 44 applications, along with a Chinese navigation benchmark featuring multi-choice action annotations. Building upon this dataset, we propose a semantic context mechanism that distills history screenshots and actions into concise, natural language summaries, significantly reducing computational costs while preserving task-relevant information. Through supervised and reinforcement fine-tuning, SecAgent outperforms similar-scale baselines and achieves performance comparable to 7B-8B models on our and public navigation benchmarks. We will open-source the training dataset, benchmark, model, and code to advance research in multilingual mobile GUI automation.