Search papers, labs, and topics across Lattice.
1
0
3
Forget auxiliary losses and fixed expert capacity: Expert Threshold routing dynamically allocates computation in MoEs and balances expert load, all while boosting data efficiency by 1.6x.