Search papers, labs, and topics across Lattice.
This paper introduces DocTrace, a multi-agent retrieval-augmented generation (RAG) framework designed for long-document question answering that addresses key limitations of existing methods by integrating query-triggered knowledge organization, document-structure awareness, and experience-guided reasoning. By utilizing a lightweight document structural tree index and a hypergraph-structured working memory, DocTrace effectively preserves the hierarchical structure of documents and enables the reuse of successful reasoning plans. Experimental results demonstrate that DocTrace outperforms the leading baseline, ComoRAG, achieving up to 8.85% improvement in F1 score and 4.40% in EM while significantly reducing computational costs by over 53%.
DocTrace not only outperforms existing models in long-document QA but also slashes computational costs by over half, revolutionizing how LLMs can manage complex reasoning tasks.
Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depend on event order, section-level context, and cross-part evidence connections. Although retrieval-augmented generation (RAG) reduces the input context by retrieving relevant evidence, existing structured RAG methods still face three limitations: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience. To address these limitations, we propose DocTrace, a multi-agent RAG framework for long-document QA that supports query-triggered knowledge organization, document-structure-aware and experience-guided reasoning. DocTrace preserves document hierarchy with a lightweight document structural tree index, constructs agent-shared hypergraph-structured working memory on demand during reasoning, and stores successful reasoning plans in graph-structured experience memory for future reuse, enabling adaptive exploration across related long-document questions. Experiments on four long-document QA datasets show that DocTrace achieves the best performance on three datasets, surpassing the strongest baseline, ComoRAG, by up to 8.85% in F1 and 4.40% in EM, while reducing the overall computational cost by 53.32%