Search papers, labs, and topics across Lattice.
1
0
3
Dramatically cut MoE expert-switching rates (from 50% to <5%) with minimal accuracy loss by training a controller to decide *when* to switch, not just *which* expert to use.