Search papers, labs, and topics across Lattice.
1
0
3
1
Unsupervised RL for math reasoning hinges on a model's pre-existing logical abilities, and its success can be predicted by whether the training trajectory stays within stable "manifolds" of good solutions.