Search papers, labs, and topics across Lattice.
2
0
4
24
Ditch the ELBO: bypassing biased likelihood approximations in RL fine-tuning of diffusion LMs unlocks more stable and effective policy optimization, yielding nearly 20% accuracy gains on challenging tasks.
A 3B model, guided by a novel RL framework, can outperform a 20B model in capturing diverse human perspectives, challenging the assumption that larger models inherently possess better alignment.