Search papers, labs, and topics across Lattice.
1
0
3
Solve SMoE load balancing at inference time without retraining by replicating heavily used experts and quantizing underutilized ones, achieving up to 1.4x imbalance reduction.