Search papers, labs, and topics across Lattice.
2
0
3
7
Forget external teachers – the best way to boost your RL policy might be learning from its future self.
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.