Search papers, labs, and topics across Lattice.
Ruhr University Bochum
1
0
3
1
Not all layers are created equal: pruning the KV cache in a layer-dependent manner significantly boosts long-context LLM performance compared to uniform pruning strategies.