Search papers, labs, and topics across Lattice.
Institute of Computing Technology, Chinese Academy of Sciences
3
0
4
2
Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.
Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.
Cut your debugging time: CCL-D slashes the diagnosis time for slow/hang anomalies in large-scale distributed training from days to just 6 minutes.