Zhongxiang Dai

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (1)

Frequent co-authors

Zixuan Huang (1)Xin Xia (1)Yuxi Ren (1)Jianbin Zheng (1)

Papers (1)

Jan 30, 2026

Zixuan Huang +12Jan 30, 2026·also UQ

Real-Time Aligned Reward Model beyond Semantics

Stop overfitting your reward model: R2M leverages real-time policy feedback to dynamically align the reward model with the evolving policy distribution, reducing reward overoptimization in RLHF.

Zixuan Huang, Xin Xia, Yuxi Ren +10

RLHF & Preference Learning

Search

Zhongxiang Dai

Research focus

Frequent co-authors

Papers (1)