Search papers, labs, and topics across Lattice.
1
0
3
2
Forget expensive labeled data: this method lets large reasoning models learn and outperform supervised RL by dynamically generating training data from unlabeled test queries.