Search papers, labs, and topics across Lattice.
2
9
2
10
A novel simplification of RLHF is proposed from the perspective of variational inference, called V ariational A lignment with R e-weighting ( VAR), which transforms the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form.
Ditch the RLHF complexity: a variational re-weighting approach turns alignment into stable, reward-driven SFT, rivaling existing methods.