Search papers, labs, and topics across Lattice.
1
0
3
Achieve 11.8x faster reasoning with 80% KV cache compression by estimating token importance directly from FlashAttention's intermediate results – no extra compute needed.