KAISTFeb 2, 2026arXiv:2602.01685

Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment

Byeonghu Na, Hyungho Na, Yeongmin Kim, Suhyeon Jo, Heesun Bae, Mina Kang, Il-chul Moon

AI Summary

This paper introduces Wasserstein Policy Regularization (WPR), a novel regularization technique for aligning LLMs with human preferences in RLHF, addressing the limitations of KL divergence by incorporating semantic similarity between tokens. WPR leverages the entropy-regularized Wasserstein distance to capture the geometry of the token space, resulting in penalty terms applied to the reward function via optimal dual variables. Experiments demonstrate that WPR outperforms KL- and f-divergence-based regularization methods, highlighting the effectiveness of semantic-aware policy distances for LLM alignment.

Key Contribution

Ditch KL divergence in RLHF: Wasserstein Policy Regularization uses token geometry to align LLMs better with human preferences.

Abstract

Large language models (LLMs) are commonly aligned with human preferences using reinforcement learning from human feedback (RLHF). In this method, LLM policies are generally optimized through reward maximization with Kullback-Leibler (KL) divergence regularization of the reference policy. However, KL and its $f$-divergence variants only compare token probabilities at identical indices, failing to capture semantic similarity. We propose Wasserstein Policy Regularization (WPR), a semantic-aware regularization for the RLHF framework based on the entropy-regularized Wasserstein distance, which incorporates the geometry of the token space. The dual formulation of the distance expresses the regularization as penalty terms applied to the reward via optimal dual variables, which yield a tractable objective compatible with standard RL algorithms. Empirically, our method outperforms KL- and $f$-divergence-based baselines, demonstrating the benefits of semantic-aware policy distances for alignment. Our code is available at https://github.com/aailab-kaist/WPR.

Citation Metrics

Citations0

Influential citations0

References53

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment

Related Papers