Search papers, labs, and topics across Lattice.
2
0
3
Forget pruning by variance: high-variance activations in transformers are surprisingly uncorrelated with predictive power.
Forget throwing away dimensions – quantizing your KV cache gives you way better compression for the same memory footprint, and the math proves why.