NJUMar 17, 2026arXiv:2603.16455

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

Weiqing Li, Jinyue Guo, Yaqi Wang, Haiyang Xiao, Yuewei Zhang, Guohua Liu, Hao Henry Wang

AI Summary

The paper introduces Evo-Retriever, a multimodal document retrieval framework that uses an LLM to guide curriculum evolution based on a novel Viewpoint-Pathway collaboration. This collaboration involves multi-view image alignment for fine-grained matching and bidirectional contrastive learning to generate hard queries and complementary learning paths. By adaptively adjusting the training curriculum based on model-state summaries, Evo-Retriever achieves state-of-the-art performance on ViDoRe V2 and MMEB (VisDoc) datasets.

Key Contribution

LLMs can dynamically optimize the training curriculum of multimodal retrieval models, leading to significant gains in retrieval accuracy by adapting to the model's evolving state.

Abstract

Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal retrieval confusion. To overcome this, we introduce Evo-Retriever, a retrieval framework featuring an LLM-guided curriculum evolution built upon a novel Viewpoint-Pathway collaboration. First, we employ multi-view image alignment to enhance fine-grained matching via multi-scale and multi-directional perspectives. Then, a bidirectional contrastive learning strategy generates "hard queries" and establishes complementary learning paths for visual and textual disambiguation to rebalance supervision. Finally, the model-state summary from the above collaboration is fed into an LLM meta-controller, which adaptively adjusts the training curriculum using expert knowledge to promote the model's evolution. On ViDoRe V2 and MMEB (VisDoc), Evo-Retriever achieves state-of-the-art performance, with nDCG@5 scores of 65.2% and 77.1%.

Multimodal Models Recommendation & Information Retrieval Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

Related Papers