Feb 25, 2026arXiv:2602.21652

Sparsity Induction for Accurate Post-Training Pruning of Large Language Models

Minhao Jiang, Zhikai Li, Xuewen Liu, Jing Zhang, Mengjuan Chen, Qingyi Gu

AI Summary

This paper introduces Sparsity Induction, a novel post-training pruning (PTS) technique for large language models that enhances sparsity-friendliness before actual pruning. Sparsity Induction operates at two levels: distribution level, using scaling transformations to enhance distributional sparsity, and feature level, using Spectral Norm Loss to promote feature sparsity from a low-rank perspective. Experiments demonstrate that Sparsity Induction achieves superior pruning performance compared to existing PTS methods across various model architectures and tasks by making the models more amenable to weight removal.

Key Contribution

Achieve superior LLM pruning performance by first nudging models toward sparsity-friendliness *before* applying any weight removal.

Abstract

Large language models have demonstrated capabilities in text generation, while their increasing parameter scales present challenges in computational and memory efficiency. Post-training sparsity (PTS), which reduces model cost by removing weights from dense networks, is an effective approach. However, native dense matrices lack high sparsity, making existing approaches that directly remove weights disrupt model states, resulting in unsatisfactory performance recovery even with post-tuning. We propose Sparsity Induction, which promotes models toward higher sparsity at both distribution and feature levels before pruning, to push the limits of PTS. At the distribution level, we enhance distributional sparsity through mathematically equivalent scaling transformations, which are fully absorbable and incur no extra parameters or inference-time overhead. At the feature level, we introduce Spectral Norm Loss to promote feature sparsity from a low-rank perspective. Experiments across diverse model architectures and tasks demonstrate that our method further enhances sparsity-friendliness, achieving superior pruning performance over existing approaches.

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Sparsity Induction for Accurate Post-Training Pruning of Large Language Models

Related Papers