Search papers, labs, and topics across Lattice.
2
0
3
11
Forget content, remember position: crafting pseudo-queries based on token position alone yields surprisingly effective KV cache compression for LLMs, rivaling methods that analyze input semantics.
Achieve 11.8x faster reasoning with 80% KV cache compression by estimating token importance directly from FlashAttention's intermediate results – no extra compute needed.