Search papers, labs, and topics across Lattice.
1
0
3
Self-distillation can be more effective than learning from an external teacher, but only if you optimize for preference gaps instead of blindly matching the teacher's output distribution.