Xiangyu Zhang

M 0.831 5.260 0.804 0.861 0.946 0.033 0.822 0.340 0.0752 MatRIS-S 4., M 0.815 5.042 0.771 0.864 0.941 0.036 0.788 1.676 0.0757 eSEN-

Papers on Lattice

Total citations

347

Topics

h-index

Research focus

Open-Source Models & Weights (1)RLHF & Preference Learning (1)Training Efficiency & Optimization (1)

Frequent co-authors

Yinmin Zhang (2)Daxin Jiang (2)Yanlin Lai (1)Mitt Huang (1)

Papers (2)

Feb 6, 2026

Yanlin Lai +13Feb 6, 2026

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

Even reward models that get the right answer can be dangerously wrong in their reasoning, leading to worse RLHF outcomes, but R-Align fixes this by explicitly aligning rationales with gold standard judgments.

Yanlin Lai, Mitt Huang, Hangyu Guo +11

Mar 31, 2025

Jingcheng Hu +5Mar 31, 2025·also Tsinghua AI

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Forget complex RLHF pipelines: simple PPO with rule-based rewards can outperform state-of-the-art reasoning models while slashing training costs by 90%.

Jingcheng Hu, Yinmin Zhang, Qi Han +3347

Open-Source Models & Weights RLHF & Preference Learning Training Efficiency & Optimization

Search

Xiangyu Zhang

Research focus

Frequent co-authors

Papers (2)