Search papers, labs, and topics across Lattice.
PolarQuant is introduced as a post-training quantization method for LLMs, leveraging the distributional properties of weights for near-lossless compression. It normalizes weights block-wise, applies a Walsh-Hadamard rotation to approximate Gaussian variables, and then quantizes using Gaussian-matched centroids. Experiments on Qwen3.5-9B show that Hadamard rotation is the key to reducing perplexity to near FP16 levels with Q5 quantization, and PolarQuant also improves the performance of downstream INT4 quantization.
Hadamard rotations unlock near-lossless 5-bit quantization for LLMs, outperforming standard techniques without calibration data.
We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.