Search papers, labs, and topics across Lattice.
2
0
4
0
A fault in one GPU process no longer needs to crash them all: this paper introduces mechanisms for fault-resilient NVIDIA MPS, enabling more robust multi-tenant GPU clusters.
Solve SMoE load balancing at inference time without retraining by replicating heavily used experts and quantizing underutilized ones, achieving up to 1.4x imbalance reduction.