Search papers, labs, and topics across Lattice.
1
0
3
Asynchronous RL for LLMs can be sped up 2.5x by explicitly controlling policy-gradient variance, without sacrificing synchronous performance.