ECNUSMUApr 15, 2026arXiv:2604.13463

From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing

Yiheng Xiong, Shiwen Song, Bo Ma, Ting Su

AI Summary

PropGen, a novel approach, automates property generation for Android apps by using LLMs to synthesize properties from observed app behaviors gathered through functionality-guided exploration. It addresses the challenge of lacking explicit test oracles in mobile app testing by systematically uncovering and executing app functionalities to derive accurate properties. Experiments on 12 real-world Android apps demonstrated that PropGen effectively identifies and executes valid app functionalities, generates valid properties, and repairs imprecise ones, leading to the discovery of 25 previously unknown functional bugs.

Key Contribution

LLMs can now automatically generate property-based tests for Android apps, uncovering 25 previously unknown bugs that eluded existing functional testing techniques.

Abstract

Mobile apps often suffer from functional bugs that do not cause crashes but instead manifest as incorrect behaviors under specific user interactions. Such bugs are difficult to detect automatically because they often lack explicit test oracles. Property-based testing can effectively expose them by checking intended behavioral properties under diverse interactions. However, its use largely depends on manually written properties, whose construction is difficult and expensive, limiting its practical use for mobile apps. To address this limitation, we propose PropGen, an automated approach for generating properties for Android apps. However, this task is challenging for two reasons: app functionalities are often hard to systematically uncover and execute, and properties are difficult to derive accurately from observed behaviors. To this end, PropGen performs functionality-guided exploration to collect behavioral evidence from app executions, synthesizes properties from the collected evidence, and refines imprecise properties based on testing feedback. We implemented PropGen and evaluated it on 12 real-world Android apps. The results show that PropGen can effectively identify and execute valid app functionalities, generate valid properties, and repair most imprecise ones. Across all apps, PropGen identified 1,210 valid functionalities and correctly executed 977 of them, compared with 491 and 187 for the baseline. It generated 985 properties, 912 of which were valid, and repaired 118 of 127 imprecise ones exposed during testing. With the resulting properties, we found 25 previously unknown functional bugs in the latest versions of the subject apps, many of which were missed by existing functional testing techniques.

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing

Related Papers