Search papers, labs, and topics across Lattice.
Sino-German Joint Software Institute, Beihang University
2
0
4
RATrain achieves a remarkable 1.35脳 speedup in training large language models on bandwidth-limited supercomputers, challenging the assumption that high-bandwidth environments are necessary for optimal performance.
LLM inference on supercomputers doesn't have to be a bottleneck: THInfer achieves up to 84% higher throughput than A800 GPUs by co-designing hardware-aware kernels and a communication-optimized pipeline.