Search papers, labs, and topics across Lattice.
University of California
1
0
2
Counterintuitively, distilling LLMs is more effective when you only use the first few tokens of a student's rollout, surpassing full-trajectory distillation while saving compute.