Anningzhe Gao

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (1)Training Efficiency & Optimization (1)

Frequent co-authors

Yuhao Du (2)Zhuo Li (2)Pengyu Cheng (2)Yuejiao Xie (2)

Papers (2)

2026

Yuhao Du +62026

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

A novel simplification of RLHF is proposed from the perspective of variational inference, called V ariational A lignment with R e-weighting ( VAR), which transforms the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form.

Yuhao Du, Zhuo Li, Pengyu Cheng +4

Feb 16, 2025

Yuhao Du +6Feb 16, 2025

Simplify RLHF as Reward-Weighted SFT: A Variational Method

Ditch the RLHF complexity: a variational re-weighting approach turns alignment into stable, reward-driven SFT, rivaling existing methods.

Yuhao Du, Zhuo Li, Pengyu Cheng +49

RLHF & Preference Learning Training Efficiency & Optimization

Search

Anningzhe Gao

Research focus

Frequent co-authors

Papers (2)