Search papers, labs, and topics across Lattice.
This paper investigates the use of structured linked data, specifically Schema.org markup and dereferenceable entity pages, as a memory layer to improve retrieval accuracy and answer quality in both standard and agentic RAG systems. They experiment with different document representations (plain HTML, HTML with JSON-LD, and enhanced agentic-optimized entity pages) and retrieval modes (standard RAG and agentic RAG with multi-hop link traversal) across four domains. Results show that enhanced entity pages, incorporating agent instructions and neural search, significantly improve accuracy (+29.6% for standard RAG and +29.8% for agentic RAG), while JSON-LD alone provides only modest gains.
Ditching flat text for structured linked data in RAG systems can boost accuracy by nearly 30%, but only if you go beyond basic JSON-LD and add agent-friendly instructions and neural search.
Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.