Search papers, labs, and topics across Lattice.
1
0
3
12
Steal accuracy from dense models and stabilize MoE training with a simple teacher-guided routing scheme that combats gradient starvation.