Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong
1
0
3
4
Forget expensive human annotation: this self-play method lets LLMs bootstrap their own training signals for open-ended tasks by generating rubrics to evaluate their own outputs.