Search papers, labs, and topics across Lattice.
1
0
3
1
Ditch the auxiliary losses: Expert Threshold routing achieves better load balancing and language modeling performance than Token-Choice MoE by dynamically routing tokens based on learned thresholds.