Search papers, labs, and topics across Lattice.
The paper introduces SAGE, a framework for retrieval-augmented question answering over heterogeneous data that constructs a chunk-level graph offline using metadata-driven similarities and performs online retrieval by expanding from seed chunks. This approach addresses the limitations of both costly knowledge graph traversal and flat similarity search by enabling multi-hop evidence chains across modalities. Experiments on OTT-QA and STaRK datasets demonstrate that SAGE improves retrieval recall by 5.7 and 8.5 points, respectively, compared to baseline methods.
Forget costly knowledge graphs: SAGE offers a lightweight, chunk-level graph expansion method that boosts retrieval recall by up to 8.5 points on heterogeneous QA tasks.
Retrieval-augmented question answering over heterogeneous corpora requires connected evidence across text, tables, and graph nodes. While entity-level knowledge graphs support structured access, they are costly to construct and maintain, and inefficient to traverse at query time. In contrast, standard retriever-reader pipelines use flat similarity search over independently chunked text, missing multi-hop evidence chains across modalities. We propose SAGE (Structure Aware Graph Expansion) framework that (i) constructs a chunk-level graph offline using metadata-driven similarities with percentile-based pruning, and (ii) performs online retrieval by running an initial baseline retriever to obtain k seed chunks, expanding first-hop neighbors, and then filtering the neighbors using dense+sparse retrieval, selecting k' additional chunks. We instantiate the initial retriever using hybrid dense+sparse retrieval for implicit cross-modal corpora and SPARK (Structure Aware Planning Agent for Retrieval over Knowledge Graphs) an agentic retriever for explicit schema graphs. On OTT-QA and STaRK, SAGE improves retrieval recall by 5.7 and 8.5 points over baselines.