Runlong Zhou

Tsinghua University

Tsinghua AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (1)

Frequent co-authors

Ruizhe Shi (1)Minhak Song (1)Zihan Zhang (1)Maryam Fazel (1)

Papers (1)

May 26, 2025

Tsinghua AIMay 26, 2025

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

RLHF's two-stage approach can statistically outperform DPO when learning from implicitly sparse rewards, challenging the narrative that end-to-end preference optimization is always superior.

Ruizhe Shi, Minhak Song, Runlong Zhou +36

RLHF & Preference Learning

Search

Runlong Zhou

Research focus

Frequent co-authors

Papers (1)