Search papers, labs, and topics across Lattice.
IRLab, University of Amsterdam
1
0
3
Forget expensive human annotations: SubSearch unlocks more robust reasoning in LLMs by directly rewarding intermediate steps with intrinsic rewards, outperforming outcome-only supervision.