USCMar 10, 2026arXiv:2603.09034

Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition

J. Prescott, Thanathai Lertpetchpun, Shrikanth S. Narayanan

AI Summary

This paper investigates the trade-off between capacity and adversarial robustness in neural audio codecs used as a defense mechanism for automatic speech recognition (ASR). By varying the depth of residual vector quantization (RVQ) in the codec, the authors demonstrate a non-monotonic relationship between quantization granularity, speech content preservation, and robustness against gradient-based adversarial attacks. The key finding is that intermediate RVQ depths offer the best balance, minimizing transcription error by suppressing adversarial perturbations while maintaining speech content, and that adversarial token changes correlate with transcription errors.

Key Contribution

A Goldilocks zone exists for neural audio codec quantization depth, where intermediate levels strike the best balance between suppressing adversarial noise and preserving speech content for robust ASR.

Abstract

Adversarial perturbations exploit vulnerabilities in automatic speech recognition (ASR) systems while preserving human perceived linguistic content. Neural audio codecs impose a discrete bottleneck that can suppress fine-grained signal variations associated with adversarial noise. We examine how the granularity of this bottleneck, controlled by residual vector quantization (RVQ) depth, shapes adversarial robustness. We observe a non-monotonic trade-off under gradient-based attacks: shallow quantization suppresses adversarial perturbations but degrades speech content, while deeper quantization preserves both content and perturbations. Intermediate depths balance these effects and minimize transcription error. We further show that adversarially induced changes in discrete codebook tokens strongly correlate with transcription error. These gains persist under adaptive attacks, where neural codec configurations outperform traditional compression defenses.

Inference & Quantization Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition

Related Papers