Search papers, labs, and topics across Lattice.
1
0
3
LLM inference on supercomputers doesn't have to be a bottleneck: THInfer achieves up to 84% higher throughput than A800 GPUs by co-designing hardware-aware kernels and a communication-optimized pipeline.