PKUUT DallasJun 15, 2026arXiv:2606.16650

Understanding Automated Web GUI Testing: An Empirical Study Across Exploration Strategies and State Abstractions

Chenxu Liu, Wei Yang, Ying Zhang, Tao Xie

AI Summary

This empirical study investigates the interplay between exploration strategies and state abstractions in automated web GUI testing (AWGT), focusing on model-based, reinforcement learning (RL)-based, and large language model (LLM)-based approaches. The analysis reveals that no single strategy dominates across all metrics; instead, different strategies exhibit complementary strengths in code coverage, state coverage, and failure discovery. Notably, the effectiveness of state abstraction varies, with strict abstractions benefiting model-based methods and compact abstractions enhancing RL-based methods, while LLM performance is significantly influenced by the representation of historical context.

Key Contribution

No single exploration strategy outperforms others in automated web GUI testing; instead, their strengths are complementary, revealing critical insights for optimizing testing effectiveness.

Abstract

Automated web GUI testing (AWGT) relies on exploration strategies that exercise web applications through GUI actions to maximize code coverage, spanning traditional model-based, reinforcement learning (RL)-based, and emerging large language model (LLM)-based approaches. State abstraction, which detects pages with the same functionality to avoid repeated testing, has long been recognized as critical to guiding exploration. However, how exploration strategies and state abstractions jointly affect testing effectiveness remains underexplored. We present an empirical study analyzing both factors from the perspectives of code coverage and failure revelation. We compare representative model-based, RL-based, and LLM-based approaches; investigate how six state abstractions influence model-based and RL-based approaches; examine LLM-based approaches under different history representations, which act as a form of state abstraction; and compare the failures exposed by different approaches. Our results show that no single strategy excels across all dimensions; instead, categories exhibit complementary strengths in code coverage, state coverage, and failure discovery. State abstraction is a key factor: strict, fine-grained abstractions favor model-based strategies, while compact ones better support RL-based strategies. History representation substantially affects LLM-based strategies, where concise, functionality-level context performs best. We also find that code coverage is weakly correlated with failure-revealing ability, underscoring the need for multi-dimensional evaluation. These findings offer practical guidance for selecting exploration strategies and designing effective state abstractions for AWGT.

Code Generation & Program Synthesis RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Understanding Automated Web GUI Testing: An Empirical Study Across Exploration Strategies and State Abstractions

Related Papers