Search papers, labs, and topics across Lattice.
1
0
3
4
Scale up offline policy training for diffusion LLMs without breaking the bank: dTRPO slashes trajectory computation costs while boosting performance up to 9.6% on STEM tasks.