Search papers, labs, and topics across Lattice.
1
0
2
LLMs can maintain long-context performance even with aggressive KV-cache eviction by learning to predict token importance and compressing evicted tokens into a latent memory.