Search papers, labs, and topics across Lattice.
Institute of Computing Technology, Chinese Academy of Sciences
2
0
4
1
Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.
Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.