Search papers, labs, and topics across Lattice.
1
0
3
3
Ditch the ELBO: bypassing biased likelihood approximations in RL fine-tuning of diffusion LMs unlocks more stable and effective policy optimization, yielding nearly 20% accuracy gains on challenging tasks.