Search papers, labs, and topics across Lattice.
The authors introduce Faithfulness-QA, a dataset of 99K question-answer pairs designed to train RAG models to prioritize retrieved context over parametric knowledge. The dataset is constructed by systematically substituting answer-bearing entities in existing QA datasets with type-consistent alternatives, creating controlled conflicts between context and model knowledge. Experiments demonstrate its utility for training models to attend more faithfully to the provided context.
RAG models struggle to ignore their pre-trained knowledge, even when it contradicts the provided context, but a new dataset can help them learn to be more faithful.
Retrieval-Augmented Generation (RAG) models frequently produce answers grounded in parametric memory rather than the retrieved context, undermining the core promise of retrieval augmentation. A fundamental obstacle to fixing this unfaithfulness is the lack of training data that explicitly requires models to prefer context over internal knowledge. We introduce Faithfulness-QA, a large-scale dataset of 99,094 samples constructed through counterfactual entity substitution. Starting from two established extractive QA benchmarks--SQuAD and TriviaQA--we automatically identify answer-bearing named entities in each context, replace them with type-consistent alternatives drawn from a curated bank of 76,953 entities, and thereby manufacture controlled knowledge conflicts between context and parametric memory. Rigorous quality filtering ensures 100% pass rates across four automated checks on random 200-sample audits. We release the full dataset, the construction pipeline, and a typed entity bank covering eight named entity categories. Faithfulness-QA is designed as a training resource for attention-based faithfulness objectives and as an evaluation benchmark for measuring context-grounding behavior in RAG systems. Data and code are available at https://github.com/qzhangFDU/faithfulness-qa-dataset.