ASUJan 24, 2026arXiv:2601.17329

Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment

Tiejin Chen, Xiaoou Liu, Vishnu Nandam, Kuan-Ru Liou, Hua Wei

AI Summary

This paper introduces Conformal Feedback Alignment (CFA), a framework that leverages conformal prediction to quantify the reliability of individual answers used in preference-based alignment. CFA constructs conformal prediction sets for each answer, aggregates these into reliability scores, and uses these scores as weights in DPO and PPO training. Experiments demonstrate that CFA improves alignment robustness and data efficiency by explicitly modeling answer-side uncertainty, complementing existing preference-level weighting schemes.

Key Contribution

Forget weighting preferences alone – this new method uses conformal prediction to directly quantify and leverage the reliability of the *answers* themselves, leading to more robust and data-efficient LLM alignment.

Abstract

Preference-based alignment like Reinforcement Learning from Human Feedback (RLHF) learns from pairwise preferences, yet the labels are often noisy and inconsistent. Existing uncertainty-aware approaches weight preferences, but ignore a more fundamental factor: the reliability of the \emph{answers} being compared. To address the problem, we propose Conformal Feedback Alignment (CFA), a framework that grounds preference weighting in the statistical guarantees of Conformal Prediction (CP). CFA quantifies answer-level reliability by constructing conformal prediction sets with controllable coverage and aggregates these reliabilities into principled weights for both DPO- and PPO-style training. Experiments across different datasets show that CFA improves alignment robustness and data efficiency, highlighting that modeling \emph{answer-side} uncertainty complements preference-level weighting and yields more robust, data-efficient alignment. Codes are provided here.

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment

Related Papers