Search papers, labs, and topics across Lattice.
Peking University, Huawei Cloud
1
0
2
Transforming the KV cache from a monolithic structure into a dynamic, head-aware system could revolutionize LLM serving efficiency and scalability.