Search papers, labs, and topics across Lattice.
The paper introduces FGDM, a multi-agent framework leveraging LLMs with Chain-of-Thought (COT) and Tree-of-Thoughts (TOT) prompting for improved software bug detection. FGDM constructs a flow graph of the code, identifies erroneous segments, and generates repaired code, also incorporating a FAISS vector database for retrieving similar past bugs and fixes. Experiments on 100 programs from diverse projects show FGDM significantly outperforms existing approaches, reducing Levenshtein distance by 24.33 for Python and 8.37 for C, and improving cosine similarity to 0.951 and 0.974, respectively.
LLMs can find and fix bugs in complex codebases far better when structured as a team of reasoning agents, outperforming existing methods by a large margin.
Deep Learning methods are becoming prominent in automated software bug detection; however, they lack the global understanding of the given code. Consequently, their performance tends to degrade, especially when they are applied to large interconnected code bases or complex modular programs. Recently, Large Language Models (LLMs) have proven to be effective at capturing dependencies among multiple interconnected modules in the codebase. This motivated us to propose the Flow-Graph-Driven Multi-Agent Framework (FGDM), which is composed of four agents that operate in a sequential manner. The framework converts the received code to a flow graph, identifies the erroneous segments, and further generates the repaired code. All the employed agents utilize Chain-of-Thought (COT) and Tree-of-Thoughts (TOT) prompts. Additionally, we also integrated with the FAISS vector database to retrieve similar previous bugs and their repairs. We demonstrated the efficacy of the proposed framework over 100 programs from several projects, including Ansible, Black, FastAPI, Keras, Luigi, Matplotlib, Pandas, Scrapy, SpaCy, and Tornado in both C and Python programs. Our experiments demonstrate that the FGDM outperforms the extant approaches and yielded reductions with a mean of 24.33 and 8.37 in Levenshtein distance and similarities of 0.951 and 0.974 in cosine similarity for Python and C, respectively.