Search papers, labs, and topics across Lattice.
2
0
2
FlashCP achieves up to 1.63x faster training for large language models by eliminating redundant communication and optimizing workload balance.
A fault in one GPU process no longer needs to crash them all: this paper introduces mechanisms for fault-resilient NVIDIA MPS, enabling more robust multi-tenant GPU clusters.