TU MunichFeb 22, 2026arXiv:2602.19294

Towards Automated Page Object Generation for Web Testing using Large Language Models

Betül Karagöz, Filippo Ricca, Matteo Biagiola, Andrea Stocco

AI Summary

This paper investigates the feasibility of using Large Language Models (LLMs), specifically GPT-4o and DeepSeek Coder, to automate the generation of Page Objects (POs) for web testing, a task traditionally performed manually. The study evaluates the accuracy and element recognition rate of LLM-generated POs against a benchmark of five web applications with manually written POs. Results indicate that LLMs can generate syntactically correct and functionally useful POs, achieving accuracy between 32.6% and 54.0% and element recognition rates exceeding 70% in most cases, highlighting both the potential and remaining challenges of LLMs in this domain.

Key Contribution

LLMs can automate web testing Page Object generation with over 70% element recognition, but still struggle with accuracy, revealing key areas for improvement in integrating LLMs into practical testing workflows.

Abstract

Page Objects (POs) are a widely adopted design pattern for improving the maintainability and scalability of automated end-to-end web tests. However, creating and maintaining POs is still largely a manual, labor-intensive activity, while automated solutions have seen limited practical adoption. In this context, the potential of Large Language Models (LLMs) for these tasks has remained largely unexplored. This paper presents an empirical study on the feasibility of using LLMs, specifically GPT-4o and DeepSeek Coder, to automatically generate POs for web testing. We evaluate the generated artifacts on an existing benchmark of five web applications for which manually written POs are available (the ground truth), focusing on accuracy (i.e., the proportion of ground truth elements correctly identified) and element recognition rate (i.e., the proportion of ground truth elements correctly identified or marked for modification). Our results show that LLMs can generate syntactically correct and functionally useful POs with accuracy values ranging from 32.6% to 54.0% and element recognition rate exceeding 70% in most cases. Our study contributes the first systematic evaluation of LLMs strengths and open challenges for automated PO generation, and provides directions for further research on integrating LLMs into practical testing workflows.

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Automated Page Object Generation for Web Testing using Large Language Models

Related Papers