Search papers, labs, and topics across Lattice.
1
0
3
4
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.