Search papers, labs, and topics across Lattice.
1
0
3
2
Vanilla on-policy distillation falls apart in multi-turn settings due to compounding errors, but a simple curriculum on trajectory length fixes it, even letting students beat their teachers.