Search papers, labs, and topics across Lattice.
3
9
4
3
A novel simplification of RLHF is proposed from the perspective of variational inference, called V ariational A lignment with R e-weighting ( VAR), which transforms the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form.
SeedPolicy overcomes the long-horizon limitations of Diffusion Policies in robot manipulation by compressing temporal information with a novel gated attention mechanism, achieving state-of-the-art imitation learning performance with significantly fewer parameters than vision-language-action models.
Ditch the RLHF complexity: a variational re-weighting approach turns alignment into stable, reward-driven SFT, rivaling existing methods.