Search papers, labs, and topics across Lattice.
Princeton University
1
0
3
7
Turn sparse binary rewards into dense supervision signals by having a model revise its own work, then distilling the revision strategy back into the original generation.