Search papers, labs, and topics across Lattice.
脡cole polytechnique f茅d茅rale de Lausanne (EPFL)
1
0
2
8
Forget expensive human annotations: this unsupervised method trains reward models that steer LLM reasoning just as well as, or even better than, their supervised counterparts.