Mar 18, 2026arXiv:2603.17504

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

AI Summary

The authors introduce HypoTermInstruct, a supervised fine-tuning (SFT) dataset of 31,487 responses to 11,151 questions about non-existent terms, designed to induce epistemological humility in LLMs. They fine-tuned Llama3.1-8B and Gemma3-4B models using LoRA across 100 configurations, comparing performance against paired controls. Results show that HypoTermInstruct significantly improves the HypoTerm Score (up to 25.91%) and FactScore (up to +0.86%) while maintaining MMLU performance, demonstrating the effectiveness of targeted SFT for reducing hallucination.

Key Contribution

Teaching LLMs to say "I don't know" is now possible via targeted SFT, slashing hallucination rates without sacrificing performance on other tasks.

Abstract

Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textit{HypoTermInstruct}$, an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release $\textit{HypoTermQA-Enhanced}$, a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across $\textit{Llama3.1-8B}$ and $\textit{Gemma3-4B}$ (base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with $\textit{HypoTermInstruct}$ significantly improves the HypoTerm Score (median increases of 0.19% to 25.91%) and FactScore (+0.39% to +0.86%), while maintaining stable performance on MMLU (minimal decreases of 0.26% to 0.35%). Our work demonstrates that targeted, high-quality SFT data teaching meta-cognitive skills can effectively reduce hallucination without preference/RL pipelines, providing mechanistic insights and a practical path toward more reliable AI systems.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Related Papers