Search papers, labs, and topics across Lattice.
This paper introduces CompSRT, a compression method for the SwinIR-light image super-resolution transformer, based on insights from analyzing the statistical effects of Hadamard transforms on weight and activation distributions. The authors demonstrate that Hadamard transforms improve quantization by reducing the range of values and increasing the proportion of values near zero. CompSRT combines Hadamard-based quantization with scalar decomposition, achieving state-of-the-art quantization performance with gains up to 1.53 dB and further improving compression via 40% pruning at 3-4 bits with minimal performance degradation.
Hadamard transforms aren't magic for quantizing image super-resolution transformers, they just squash the dynamic range and push values towards zero, and CompSRT leverages this for SOTA compression.
Model compression has become an important tool for making image super resolution models more efficient. However, the gap between the best compressed models and the full precision model still remains large and a need for deeper understanding of compression theory on more performant models remains. Prior research on quantization of LLMs has shown that Hadamard transformations lead to weights and activations with reduced outliers, which leads to improved performance. We argue that while the Hadamard transform does reduce the effect of outliers, an empirical analysis on how the transform functions remains needed. By studying the distributions of weights and activations of SwinIR-light, we show with statistical analysis that lower errors is caused by the Hadamard transforms ability to reduce the ranges, and increase the proportion of values around $0$. Based on these findings, we introduce CompSRT, a more performant way to compress the image super resolution transformer network SwinIR-light. We perform Hadamard-based quantization, and we also perform scalar decomposition to introduce two additional trainable parameters. Our quantization performance statistically significantly surpasses the SOTA in metrics with gains as large as 1.53 dB, and visibly improves visual quality by reducing blurriness at all bitwidths. At $3$-$4$ bits, to show our method is compatible with pruning for increased compression, we also prune $40\%$ of weights and show that we can achieve $6.67$-$15\%$ reduction in bits per parameter with comparable performance to SOTA.