Search papers, labs, and topics across Lattice.
This paper introduces MAAD (Multi-Agent Architecture Design), a framework that utilizes four specialized agents to autonomously convert requirements into detailed architectural blueprints while incorporating architectural standards and leveraging a hierarchical memory for iterative refinement. The study demonstrates that MAAD outperforms MetaGPT in generating more complete, modular, and traceable architectures, as evidenced by quantitative metrics and qualitative feedback from industry architects. Notably, the effectiveness of MAAD is significantly influenced by the reasoning capabilities of the underlying LLM, with advanced models like GPT-5.2 and Qwen3.5 showing superior performance in architecture design tasks.
MAAD not only automates architecture design but also enhances the quality of outputs through a collaborative agent framework and advanced LLM integration.
Software architecture design is a critical yet inherently complex and knowledge-intensive phase that requires balancing competing quality attributes and adapting to evolving requirements. Traditionally, this process has been time-consuming, labor-intensive, and heavily reliant on architects, often resulting in limited exploration of alternative architectural decompositions and styles, especially under the pressures of agile development. While LLM-based agents have shown promising performance across various software engineering tasks, their application to architecture design remains relatively scarce and requires systematic exploration. To address these challenges, we proposed MAAD (Multi-Agent Architecture Design), a knowledge-driven framework that orchestrates four specialized agents (i.e., Analyst, Modeler, Designer and Evaluator) to autonomously and collaboratively transform requirements specifications into comprehensive, multi-view architectural blueprints with quality attribute assessments. MAAD incorporates RAG to inject recognized architectural standards and patterns into the workflow and leverages a hierarchical memory mechanism that captures design history for iterative refinement. We evaluated MAAD through comparative experiments against MetaGPT, using quantitative architecture-level metrics across 10 case studies and qualitative feedback from industry architects on 10 real-world specifications. Results show that MAAD generates more complete, modular, and traceable architectures than the baseline, and its dedicated Evaluator agent autonomously produces structured quality evaluation reports that significantly reduce manual validation efforts. Furthermore, we found that the quality of the generated architecture heavily depends on the underlying LLM's reasoning capacity, with GPT-5.2 and Qwen3.5 outperforming other models across most evaluation settings.