Search papers, labs, and topics across Lattice.
1
0
2
3
Students can surpass their teachers in on-policy distillation by extrapolating rewards and merging knowledge from domain experts, challenging the conventional wisdom that students are inherently limited by their teachers' capabilities.