Search papers, labs, and topics across Lattice.
2
4
3
37
Forget fancy objectives: a simple, regularized latent dynamics model can achieve state-of-the-art zero-shot RL performance, even with limited data.
Forget fixed margins in RLHF: modeling the *strength* of human preferences with "preference-over-preference" learning boosts both discriminative accuracy and generative quality.