Search papers, labs, and topics across Lattice.
1
0
3
2
Quantizing rollouts in LLM RL pipelines introduces a training-inference gap that QaRL closes, leading to +5.5 performance on math problems.