Search papers, labs, and topics across Lattice.
1
0
3
6
Turn sparse binary rewards into dense supervision signals by having a model revise its own work, then distilling the revision strategy back into the original generation.