Search papers, labs, and topics across Lattice.
2
0
5
Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.
Skip the costly generative evals: a simple probe trained on internal LLM representations can accurately predict downstream task performance during training, slashing evaluation time from an hour to just three minutes.