Search papers, labs, and topics across Lattice.
2
0
4
LLMs can be pruned more effectively by considering the information entropy of their output distribution, surpassing the limitations of traditional cross-entropy-based Taylor pruning.
Achieve near lossless 40% parameter and FLOPs reduction in large vision transformers like CLIP and DINOv2 without finetuning, thanks to adaptive MLP pruning.