Search papers, labs, and topics across Lattice.
This paper introduces EviProp, a novel retrieval method that enhances evidence page retrieval from long multimodal documents by employing seeded relevance diffusion on a Chunk-Page graph structure. By integrating dense visual page priors with sparse chunk seeds and utilizing Personalized PageRank, EviProp effectively captures fine-grained signals and document-internal associations that traditional independent matching methods overlook. Experimental results demonstrate significant improvements in evidence-page retrieval and downstream question answering accuracy, with minimal impact on retrieval speed.
EviProp achieves superior evidence-page retrieval by leveraging a novel graph-based approach that captures complex document structures, outperforming traditional methods.
Retrieving evidence pages from visually rich long documents is a key challenge in document question answering. Existing page-level visual retrievers operate under an independent matching paradigm: each page is scored in isolation based on query-page similarity. This paradigm can under-rank evidence pages whose signals are localized in fine-grained chunks or depend on document-internal associations. We propose EviProp, a retrieval method that recovers such pages via seeded relevance diffusion. EviProp models each document as a multimodal Chunk-Page graph with hierarchical, sequential, and similarity links. Given a query, it combines dense visual page priors with sparse chunk seeds, then runs Personalized PageRank to diffuse relevance over the graph. Experiments on MMLongBench-Doc and LongDocURL show consistent gains in evidence-page retrieval over independent visual retrieval and text-visual fusion baselines. Downstream QA results further show that improved retrieval translates into better answer accuracy, with negligible online retrieval overhead. Our code is released at https://github.com/Flyecnu/EviProp.