Search papers, labs, and topics across Lattice.
This paper explores the use of Microsoft Copilot to automate aspects of software architecture evaluation, specifically in analyzing quality-attribute scenarios. The study compares Copilot's analysis of student-generated scenarios against evaluations by experienced architects and the original Architecture Tradeoff Analysis Method (ATAM) exercise. Results indicate that the LLM often produces more accurate and insightful risk assessments, sensitivity point identification, and tradeoff analyses while reducing effort, suggesting potential for AI-driven automation in architecture evaluation.
LLMs like Copilot can outperform experienced architects in identifying risks and tradeoffs in software architecture scenarios, hinting at a future where AI significantly streamlines design evaluation.
Architecture evaluation methods have been extensively used to evaluate software designs. Several evaluation methods have been proposed to analyze tradeoffs between different quality attributes. Also, having competing qualities leads to conflicts when selecting which quality-attribute scenarios are the most suitable ones for an architecture to tackle. Consequently, the scenarios required by the stakeholders must be prioritized and also analyzed for potential risks. Today, architecture quality evaluation is still carried out manually, often involving long brainstorming sessions to decide on the most adequate quality-attribute scenarios for the architecture. To reduce this effort and make the assessment and selection of scenarios more efficient, in this research we propose the use of LLMs to partially automate the evaluation activities. As a first step in validating this hypothesis, this paper investigates MS Copilot as an LLM tool to analyze quality-attribute scenarios suggested by students and reviewed by experienced architects. Specifically, our study compares the results of an Architecture Tradeoff Analysis Method (ATAM) exercise conducted in a software architecture course with the results of experienced software architects and with the output produced by the LLM tool. Our initial findings reveal that the LLM produces in most cases better and more accurate results regarding risks, sensitivity points and tradeoff analysis of the quality scenarios generated manually, as well as it significantly reduces the effort required for the task. Thus, we argue that the use of generative AI has the potential to partially automate and support architecture evaluation tasks by suggesting more qualitative scenarios to be evaluated and recommending the most suitable ones for a given context.