Search papers, labs, and topics across Lattice.
University of California, Los Angeles
1
0
2
Counterintuitively, distilling LLMs is more effective when you only use the first few tokens of a student's rollout, surpassing full-trajectory distillation while saving compute.