Search papers, labs, and topics across Lattice.
1
0
3
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.