Search papers, labs, and topics across Lattice.
1
0
3
0
Achieve up to 12x greater sample efficiency in reasoning tasks by relaxing strict imitation constraints in on-policy distillation, enabling smaller models to match the performance of much larger ones.