Search papers, labs, and topics across Lattice.
The paper introduces TRACE, a novel framework for universal multimodal retrieval that unifies generative reasoning with discriminative representation learning by generating a Chain-of-Thought (CoT) and compressing it into a compact embedding. To train TRACE, the authors construct M-BEIR-CoT, a large-scale dataset with a difficulty-aware routing strategy. Experiments on the M-BEIR benchmark demonstrate that TRACE achieves state-of-the-art performance and exhibits learned implicit routing behavior, activating reasoning only for complex queries and demonstrating zero-shot transferability.
By generating and compressing Chain-of-Thought reasoning, TRACE achieves state-of-the-art multimodal retrieval and learns to adaptively route queries through reasoning only when necessary, balancing accuracy and efficiency.
Universal Multimodal Retrieval requires unified embedding models capable of interpreting diverse user intents, ranging from simple keywords to complex compositional instructions. While Multimodal Large Language Models (MLLMs) possess strong reasoning capabilities, prevailing adaptations confine them to static encoders, underutilizing their generative potential. This encoder-only paradigm struggles with complex intents that demand logical deduction rather than superficial pattern matching. To address this, we introduce TRACE (Task-adaptive Reasoning And Compressing Embeddings). TRACE unifies generative reasoning with discriminative representation learning. It first generates a structured Chain-of-Thought (CoT) to explicitly reason about the query, and subsequently compresses this reasoning trace into a compact embedding via a dedicated token. To train this framework, we construct M-BEIR-CoT, a large-scale dataset featuring a difficulty-aware routing strategy. Experiments on the M-BEIR benchmark establish TRACE as the new state-of-the-art. Crucially, TRACE demonstrates a learned implicit routing behavior. It autonomously activates reasoning for complex queries while bypassing it for simpler ones, achieving an optimal balance between retrieval accuracy and inference throughput. Furthermore, by internalizing the deductive process, TRACE exhibits remarkable zero-shot transferability to unseen domains and novel constraints.