Interdisciplinary TransformationPassauApr 30, 2026arXiv:2604.27306

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

Saber Zerhoudi, Michael Granitzer, Jelena Mitrović, Jelena Mitrovic

AI Summary

NuggetIndex is introduced as a RAG system that retrieves atomic information units ("nuggets") instead of passages, enabling better handling of evolving corpora by tracking evidence, temporal validity, and lifecycle state for each nugget. By filtering invalid or deprecated nuggets before ranking, the system avoids outdated information. Experiments on MS MARCO, temporal Wikipedia QA, and multi-hop QA show NuggetIndex improves nugget recall (42%), temporal correctness (9%), and reduces conflict rates (55%) compared to passage and unmanaged proposition retrieval.

Key Contribution

Stop retrieving passages in your RAG system: NuggetIndex shows that retrieving and filtering atomic "nuggets" of information yields substantial gains in recall, temporal correctness, and reduced conflicts.

Abstract

Retrieval-augmented generation (RAG) systems are frequently evaluated via fact-based metrics, yet standard implementations retrieve passages or static propositions. This unit mismatch between evaluation and retrieval objects hinders maintenance when corpora evolve and fails to capture superseded facts or source disagreements. We propose NuggetIndex, a retrieval system that stores atomic information units as managed records, so called nuggets. Each record maintains links to evidence, a temporal validity interval, and a lifecycle state. By filtering invalid or deprecated nuggets prior to ranking, the system prevents the inclusion of outdated information. We evaluate the approach using a nuggetized MS MARCO subset, a temporal Wikipedia QA dataset, and a multi-hop QA task. Against passage and unmanaged proposition retrieval baselines, NuggetIndex improves nugget recall by 42%, increases temporal correctness by 9 percentage points without the recall collapse observed in time-filtered baselines, and reduces conflict rates by 55%. The compact nugget format reduces generator input length by 64% while enabling lightweight index structures suitable for browser-based and resource-constrained deployment. We release our implementation, datasets, and evaluation scripts

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

Related Papers