Search papers, labs, and topics across Lattice.
The paper introduces GraphSteal, a novel black-box attack that reconstructs knowledge graphs from Graph RAG systems by querying the LLM and analyzing the retrieved structural evidence. It uses Depth-Wise Heuristic Search to extract node attributes and Breadth-Wise Diffusion Search to infer graph topology. Experiments show GraphSteal can recover over 90% of the original knowledge graph, exposing sensitive information and highlighting the vulnerability of Graph RAG to structural knowledge leakage.
Graph RAG systems leak almost all of their knowledge graph structure to black-box queries, even with existing safeguards.
Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora, Graph RAG integrates knowledge graphs into the retrieval pipeline, enabling LLMs to access entities, relations, and multi-hop dependencies encoded in structured knowledge. However, the same structured knowledge that empowers Graph RAG also creates a new privacy attack surface. We demonstrate that Graph RAG systems can be turned into structural oracles: through adaptive black-box interactions, an adversary can elicit sufficient relational evidence to reconstruct substantial portions of the hidden knowledge graph. We propose a structure-oriented reconstruction framework that recovers targeted graphs from both local and global perspectives. Specifically, Depth-Wise Heuristic Search extracts fine-grained node attributes by recursively expanding entity-centered evidence, while Breadth-Wise Diffusion Search infers graph topology by propagating across relation-induced neighborhoods. Experiments on generic and healthcare scenarios demonstrate that our method can recover over 90\% of the original knowledge graph from representative Graph RAG systems, revealing sensitive entities, relations, and structural dependencies with high fidelity. Existing guradrails provide limited defense against our attack, highlighting the inherent difficulty of safeguarding structural privacy in Graph RAG pipelines.