Search papers, labs, and topics across Lattice.
1
0
2
Implicit reward models can now more accurately pinpoint correct reasoning steps, thanks to a novel prefix-value learning approach that closes the train-inference gap.