Search papers, labs, and topics across Lattice.
The paper identifies that standard quantization methods, primarily developed for vision and NLP, are suboptimal for speech models due to the large calibration ranges of audio activations. To address this, they introduce ESC, an Evolution Strategy-based Calibration method that optimizes activation scaling using a two-step local-global scheme driven by an evolution strategy. ESC achieves near-lossless performance for full INT4 quantization and maintains unaltered performance under full INT8 quantization across multiple speech tasks, demonstrating its effectiveness for speech model quantization.
Speech models can now be quantized to INT4 with near-lossless performance thanks to a new evolution strategy-based calibration method tailored for audio activations.
Quantization has become essential for the efficient deployment of speech processing systems. Although widely studied, most existing quantization methods were developed for vision and NLP architectures, while the specific challenges of audio signals remain largely overlooked. In particular, we show that audio activations can exhibit large calibration ranges, leading to significant information loss when standard calibration techniques are applied. To address this, we propose ESC, an Evolution Strategy-based Calibration method that formulates activation scaling as an optimization problem and solves it using a two-step local-global scheme driven by an evolution strategy. ESC enables unaltered performance under full INT8 quantization and is the first calibration method to achieve near-lossless performance for full INT4 quantization across multiple speech tasks. Integrating ESC with PTQ methods further reduces performance loss, achieving a 1% relative accuracy degradation on the AST model.