Mar 11, 2026arXiv:2603.10540

In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory Computing

Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, G. Datta, Arindam Basu

AI Summary

This paper introduces Boundary Suppressed K-Means Quantization (BS-KMQ), a novel nonlinear quantization technique that reduces ADC resolution requirements in in-memory computing by suppressing boundary outliers before clustering, leading to more balanced quantization levels. A reconfigurable in-memory nonlinear ADC implements the resulting nonlinear references, achieving a 7x area improvement over previous designs. Evaluated on various models, BS-KMQ demonstrates significantly lower quantization error and improved post-training quantization accuracy compared to existing methods, achieving up to 4x speedup and 24x energy efficiency improvement in system-level simulations.

Key Contribution

By intelligently suppressing boundary outliers before quantization, BS-KMQ slashes quantization error by 3x and boosts energy efficiency by 24x in in-memory computing.

Abstract

In deep networks, operations such as ReLU and hardware-driven clamping often cause activations to accumulate near the edges of the distribution, leading to biased clustering and suboptimal quantization in existing nonlinear (NL) quantization methods. This paper introduces Boundary Suppressed K-Means Quantization (BS-KMQ), a novel NL quantization approach designed to reduce the resolution requirements of analog-to-digital converters (ADCs) in in-memory computing (IMC) systems. By suppressing boundary outliers before clustering, BS-KMQ achieves more balanced and informative NL quantization levels. The resulting NL references are implemented using a reconfigurable in-memory NL-ADC, achieving a 7x area improvement over prior NL-ADC designs. When evaluated on ResNet-18, VGG-16, Inception-V3, and DistilBERT, BS-KMQ achieves at least 3x lower quantization error compared to linear, Lloyd-Max, cumulative distribution function (CDF), and K-means methods. It also improves post-training quantization accuracy by up to 66.8%, 25.4%, 66.6%, and 67.7%, respectively, compared to linear quantization. After low-bit fine-tuning, BS-KMQ maintains competitive accuracy with significantly fewer NL-ADC levels (3/3/4/4b). System-level simulations on ResNet-18 (6/2/3b) demonstrate up to a 4x speedup and 24x energy efficiency improvement over existing IMC accelerators.

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory Computing

Related Papers