Search papers, labs, and topics across Lattice.
2
0
5
2
LLM post-training isn't just about objectives; it's about strategically intervening on model behavior through support expansion, policy reshaping, and behavioral consolidation.
On-policy distillation can lead to catastrophic length inflation in student models, but a simple fix stabilizes training and boosts performance by 7%.