Search papers, labs, and topics across Lattice.
1
0
3
Recursive composition of verifiable environments can boost reasoning performance in RL by up to 3.1 points while using only a fraction of the original environments.