Search papers, labs, and topics across Lattice.
A novel simplification of RLHF is proposed from the perspective of variational inference, called V ariational A lignment with R e-weighting ( VAR), which transforms the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form.