Search papers, labs, and topics across Lattice.
RippleGUItester is introduced as a change-driven GUI testing system that leverages LLMs to analyze the impact of code changes and generate realistic test scenarios. It executes these scenarios on pre- and post-change versions, performing differential analysis and multimodal bug detection (visual GUI changes + natural language change intents) to identify unintended behavioral differences. Evaluation on four software systems (Firefox, Zettlr, JabRef, Godot) revealed 26 previously unknown bugs, demonstrating the approach's ability to uncover regressions missed by existing methods.
Change-driven GUI testing, powered by LLMs and multimodal analysis, finds real bugs that slip through existing test suites, CI pipelines, and code review.
Software systems evolve continuously through frequent code changes, yet such changes often introduce unintended bugs despite extensive testing and code review. Existing testing approaches are largely constrained to predefined execution paths or rely on unguided exploration, leaving many change-induced issues undetected. To address this challenge, we present RippleGUItester, a change-driven testing system that treats a code change as the epicenter of a ripple effect and explores its broader, user-visible impacts via the GUI. Given a code change, RippleGUItester performs LLM-based change-impact analysis to generate and enrich realistic test scenarios, executes these scenarios on both pre-change and post-change versions of the system, and applies differential analysis to identify behavioral differences. Crucially, RippleGUItester employs multimodal bug detection, comparing visual GUI changes and interpreting them in the context of natural-language change intents to distinguish unintended bugs from intended behavioral updates. We evaluate our approach on hundreds of real-world code changes across four widely used software systems: Firefox, Zettlr, JabRef, and Godot. Our results show that the proposed approach uncovers bugs introduced by code changes that were missed by existing test suites, CI pipelines, and code review. In total, we identify 26 previously unknown bugs that still exist in the latest versions of the evaluated systems. After reporting, 16 bugs have been fixed, 2 have been confirmed, 6 are still under discussion, and 2 were marked as intended. We envision RippleGUItester being applied before or shortly after a code change is merged, enabling earlier detection of regressions.