Search papers, labs, and topics across Lattice.
Yuedong Xu is with College of Computer Science and Artificial Intelligence, and Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, China (e-mail: ydxu@fudan.edu.cn)
1
0
3
6
Double your LLM inference throughput by routing KV-cache through decoding engines to bypass the bandwidth bottleneck on prefill engines.