Microsoft ResearchBITPKUMar 5, 2026arXiv:2603.05168

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

Di Zhang, Xun Wu, Shaohan Huang, Yudong Wang, Hanyong Shao, Yingbo Hao, Zewen Chi, Li Dong, Ting Song, Yan Xia, Zhifang Sui, Furu Wei

AI Summary

The paper introduces Sparse-BitNet, a framework combining 1.58-bit quantization with dynamic N:M sparsity for efficient LLMs, demonstrating that BitNet's low-bit representation is inherently more compatible with sparsity than full-precision models. They show that 1.58-bit BitNet models exhibit less performance degradation and tolerate higher sparsity levels compared to full-precision counterparts under similar conditions. Using a custom sparse tensor core, Sparse-BitNet achieves up to 1.30x speedups in both training and inference.

Key Contribution

1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.

Abstract

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at https://github.com/AAzdi/Sparse-BitNet

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

Related Papers