Search papers, labs, and topics across Lattice.
Institute of Computing Technology, Chinese Academy of Sciences
2
0
4
3
Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.
Cut your debugging time: CCL-D slashes the diagnosis time for slow/hang anomalies in large-scale distributed training from days to just 6 minutes.