Search papers, labs, and topics across Lattice.
2
0
6
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.
Agentic RL can now beat proprietary LLMs and torch.compile in the challenging domain of CUDA kernel generation, achieving up to 40% speedups on hard tasks.