Tongliang Liu

D cubic B-spline basis. Further, before G, N_{\mathrm{novel}}{=}4 novel views; all images are resized to, ×10−51\times 10^{-4}\!\rightarrow\!1\times 10^{-5}, EMA decay 0.9995, and runs for

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (2)Training Efficiency & Optimization (2)Natural Language Processing (1)

Frequent co-authors

Dadong Wang (2)Xin Yu (1)Jiyang Zheng (1)Ivor W. Tsang (1)

Papers (2)

Feb 12, 2026

Xin Yu +4Feb 12, 2026·also D cubic B-spline basis. Further

Mitigating Mismatch within Reference-based Preference Optimization

DPO's reliance on a reference policy can backfire, prematurely halting learning when the reference is pessimistically wrong, but a simple one-line fix can significantly improve performance.

Xin Yu, Jiyang Zheng, Dadong Wang +2

RLHF & Preference Learning Training Efficiency & Optimization

Apr 19, 2025

Apr 19, 2025·also D cubic B-spline basis. Further

Direct Advantage Regression: Aligning LLMs with Online AI Reward

LLMs learn better from AI *reward* than AI *preference*, leading to higher human-AI agreement and improved performance compared to standard online AI feedback and RLHF.

Li He, He Zhao, Stephen Wan +3

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

Search

Tongliang Liu

Research focus

Frequent co-authors

Papers (2)