Feb 17, 2026arXiv:2602.15563

1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Sohir Maskey, Constantin Eichenberg, Johannes Messner, Douglas Orr

AI Summary

This paper investigates quantization-aware training (QAT) in the low-bit regime, focusing on the trade-off between quantization format, bit-width, and downstream performance. The authors demonstrate that k-means based weight quantization outperforms integer quantization formats in low-bit scenarios. Their key finding is that, for a fixed inference memory budget, 1-bit quantized weights achieve the best performance on generative downstream tasks.

Key Contribution

1-bit quantization, powered by k-means, can surprisingly outperform higher-bit integer quantization in generative tasks under a fixed memory budget.

Abstract

Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$-bit quantized weights.

Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Related Papers