Search papers, labs, and topics across Lattice.
The paper introduces Tail-Aware Reconstruction Quantization (TARQ), a post-training quantization method for ASR that addresses the issue of rare words being underrepresented during calibration. TARQ uses a closed-form rule called rareBAL to equalize common/tail mass in each linear layer, combined with a metric-consistent residual correction. Experiments across various ASR models and datasets demonstrate that TARQ improves rare-word error rate without sacrificing overall accuracy, and generalizes well to entity-rich benchmarks.
Quantizing ASR models can actually *improve* performance on rare words, without hurting overall accuracy, by strategically re-weighting the calibration data.
Data-aware post-training quantization (PTQ) minimizes a per-token reconstruction loss on a small calibration corpus, implicitly weighting positions by their empirical frequency. For \textbf{A}utomatic \textbf{S}peech \textbf{R}ecognition (ASR), this misaligns with tail-sensitive risk: names, numerals, and domain-specific words receive proportionally little calibration mass. We propose \textbf{Tail-Aware Reconstruction Quantization} (\TARQ), a label-free PTQ framework that shifts calibration toward the lexical tail via \textbf{\rareBAL}, a closed-form per-Linear-layer rule equalizing common/tail mass, paired with a metric-consistent residual correction. \TARQ\ requires no entity labels, no curated calibration set, no validation decoding, and no additional training. Across eight ASR backbones and six datasets at W4G128, \TARQ\ improves mean rare-\textbf{W}ord \textbf{E}rror \textbf{R}ate (rare-WER) without an aggregate-WER regression, achieves the lowest cross-corpus rare-WER swing among compared methods, and transfers to entity-rich benchmarks (ProfASR, ContextASR-Speech-En) without entity supervision.