Search papers, labs, and topics across Lattice.
Case Western Reserve University
2
0
5
Hybrid-thinking LLMs can be dramatically improved by simply separating the feed-forward pathways for reasoning and non-reasoning modes, leading to less leakage and better accuracy.
On-policy distillation can lead to catastrophic length inflation in student models, but a simple fix stabilizes training and boosts performance by 7%.