Search papers, labs, and topics across Lattice.
1
0
3
MLLMs can achieve up to 7.9x KV cache compression and 1.52x faster decoding without sacrificing performance by intelligently compressing different attention heads with distinct strategies.