Search papers, labs, and topics across Lattice.
Stanford University
1
0
2
ELBO-based RL, previously dismissed for generative model alignment, can actually beat MDP-based methods with the right tricks.