Search papers, labs, and topics across Lattice.
Rensselaer Polytechnic Institute
1
0
3
LLMs can slash memory use by 4x during reasoning without sacrificing accuracy, simply by "zooming in" on relevant cached information instead of attending to everything.