Search papers, labs, and topics across Lattice.
Chinese University of Hong Kong
1
0
3
LLMs can now compress their KV cache more effectively by dynamically synthesizing soft tokens tailored to the input, preserving crucial context that's otherwise lost with static methods.