Search papers, labs, and topics across Lattice.
1
0
3
3
GPT-2 wastes 88% of its attention on predictable information, but this new method learns to forget, achieving a 37.8x compute reduction by dynamically consolidating episodic retrievals into parametric memory.