Search papers, labs, and topics across Lattice.
Guangdong Institute of Intelligence Science and Technology, University of Macau
1
0
3
LLMs can now compress their KV cache more effectively by dynamically synthesizing soft tokens tailored to the input, preserving crucial context that's otherwise lost with static methods.