Search papers, labs, and topics across Lattice.
1
0
3
1
LLMs can achieve 2.5x higher throughput and 10.7x KV memory reduction in long-context reasoning by compressing the KV cache using trigonometric functions derived from pre-RoPE query/key vector distributions.