HuaweiNYUApr 20, 2026arXiv:2604.17948

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan, Asini Subanya, Boubacar Ballo, Kashish Satija, Mariam Shafey, Mohamed Mahmoud, M. Mahmoud, Moncif Dahaji Bouffi, M. Bouffi, Pasindu Wickramasinghe, Siyona Goel, Yaakulya Sabbani, Hakim Hacid, Mthandazo Ndhlovu, Eleanna Kafeza, Sanjay Rawat, Muhammad Shafique

AI Summary

RAVEN, a novel framework, automates vulnerability analysis report generation by integrating LLM agents with Retrieval Augmented Generation (RAG). It uses four modules: vulnerability identification, knowledge retrieval from databases like Google Project Zero reports, impact assessment, and structured report generation. Evaluated on 105 vulnerable code samples, RAVEN achieved an average report quality score of 54.21%, demonstrating its potential for automated vulnerability documentation.

Key Contribution

Automating vulnerability analysis with LLMs is now more practical: RAVEN shows how to generate structured, high-quality reports using RAG, even achieving a 54% quality score on real-world vulnerabilities.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Related Papers