Search papers, labs, and topics across Lattice.
1
0
3
LLMs can be sped up by 21% at inference time without retraining, thanks to a new sparsity method that smartly prunes activations based on the importance of the weights they interact with.