Search papers, labs, and topics across Lattice.
1
0
4
Self-play can be dramatically improved by exploiting the "question construction path" it generates as privileged information for self-distillation, leading to 2-3x faster learning.