Search papers, labs, and topics across Lattice.
Uni- versity of California
1
0
3
Attention bottlenecks in long-context decoding? SANTA slashes memory bandwidth demands by stochastically sampling value vectors, achieving 1.5x speedups without sacrificing accuracy.