Search papers, labs, and topics across Lattice.
This paper introduces TRUSTMARGIN, a training-free arbitration layer that optimally selects between parametric memory and retrieved evidence in large language models (LLMs) for answering knowledge-intensive questions. By leveraging the model's own likelihoods, TRUSTMARGIN effectively evaluates the reliability of Direct and RAG answers, addressing the issue of conflicting information from these two sources. The results show that TRUSTMARGIN significantly enhances answer quality across multiple datasets, bridging the performance gap between direct generation and retrieval-augmented generation without requiring additional training or fine-tuning.
TrustMargin achieves better answer quality by resolving conflicts between parametric memory and retrieved evidence without any training overhead.
Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence, but neither source is uniformly reliable. Retrieval can fill knowledge gaps, yet distracting passages may override correct closed-book answers. We study this post-generation conflict as answer-level source arbitration: given Direct and RAG answers from the same frozen model, decide which source to trust. We propose TRUSTMARGIN, a training-free, plug-and-play arbitration layer that scores the two existing candidates with the model's own likelihoods. It combines a parametric-prior margin, which tests whether memory accepts the retrieved answer, with an evidence-binding margin, which discounts passage-only salience and measures question-specific support. TRUSTMARGIN selects between Direct and RAG without fine-tuning, external judges, or additional generation. Across 2WIKIMQA and CWQA with three LLaMA scales, TRUSTMARGIN consistently improves over Direct generation and BM25-RAG, recovers part of the Direct/RAG oracle gap, and generalizes to multiple training-free RAG pipelines.