Search papers, labs, and topics across Lattice.
1
0
3
4
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.