Independent ResearcherVectorJan 6, 2026arXiv:2601.03027

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Sindhuja Chaduvula, Ahmed Y. Radwan, Azib Farooq, Yani Ioannou, Shaina Raza

AI Summary

The paper introduces Factuality-aware Direct Preference Optimization (F-DPO), an extension of DPO designed to mitigate hallucinations in LLMs by incorporating binary factuality labels into the preference learning process. F-DPO addresses the issue of preference alignment methods reinforcing hallucinations by applying a label-flipping transformation to correct misordered preference pairs and adding a factuality-aware margin to emphasize pairs with clear correctness differences. Experiments across seven open-weight LLMs (1B-14B) demonstrate that F-DPO significantly improves factuality and reduces hallucination rates compared to both base models and standard DPO, while also generalizing to out-of-distribution benchmarks like TruthfulQA.

Key Contribution

LLMs can be five times more truthful with a simple tweak to DPO that prioritizes factual correctness over fluency.

Abstract

Preference alignment methods such as RLHF and Direct Preference Optimization (DPO) improve instruction following, but they can also reinforce hallucinations when preference judgments reward fluency and confidence over factual correctness. We introduce F-DPO (Factuality-aware Direct Preference Optimization), a simple extension of DPO that uses only binary factuality labels. F-DPO (i) applies a label-flipping transformation that corrects misordered preference pairs so the chosen response is never less factual than the rejected one, and (ii) adds a factuality-aware margin that emphasizes pairs with clear correctness differences, while reducing to standard DPO when both responses share the same factuality. We construct factuality-aware preference data by augmenting DPO pairs with binary factuality indicators and synthetic hallucinated variants. Across seven open-weight LLMs (1B-14B), F-DPO consistently improves factuality and reduces hallucination rates relative to both base models and standard DPO. On Qwen3-8B, F-DPO reduces hallucination rates by five times (from 0.424 to 0.084) while improving factuality scores by 50 percent (from 5.26 to 7.90). F-DPO also generalizes to out-of-distribution benchmarks: on TruthfulQA, Qwen2.5-14B achieves plus 17 percent MC1 accuracy (0.500 to 0.585) and plus 49 percent MC2 accuracy (0.357 to 0.531). F-DPO requires no auxiliary reward model, token-level annotations, or multi-stage training.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Citation Metrics

Citations2

Influential citations1

References31

Year2026

VenuearXiv.org

Related Papers

Finding related papers...

Search

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Related Papers