Shizhe Shang

Papers on Lattice

Total citations

Topics

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Shiqing Ma (1)Bin Han (1)Hailong Yang (1)

Papers (1)

May 25, 2026

Bandwidth-Aware LLM Inference on Heterogeneous Many-Core Supercomputers

LLM inference on supercomputers doesn't have to be a bottleneck: THInfer achieves up to 84% higher throughput than A800 GPUs by co-designing hardware-aware kernels and a communication-optimized pipeline.

Shiqing Ma, Bin Han, Shizhe Shang +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Shizhe Shang

Research focus

Frequent co-authors

Papers (1)