Search papers, labs, and topics across Lattice.
2
0
4
ELBO-based RL, previously dismissed for generative model alignment, can actually beat MDP-based methods with the right tricks.
MLLMs are surprisingly robust to catastrophic forgetting during fine-tuning, needing only simple regularization or data-hybrid training to maintain performance.