May 5, 2026arXiv:2605.03297

Contrastive Regularization for Accent-Robust ASR

Van-Phat Thai, Aradhya Dhruv, D. Pham, Sameer Alam

AI Summary

This paper explores supervised contrastive learning (SupCon) as an auxiliary objective to improve the accent robustness of CTC-finetuned ASR systems based on self-supervised pretraining. SupCon is used as an utterance-level contrastive loss to regularize encoder representations without requiring architectural changes or explicit accent labels. Experiments on L2-ARCTIC demonstrate consistent WER reductions, up to 25-29% relative, on unseen accents, indicating improved accent invariance.

Key Contribution

Make your ASR models 25% more accent-robust with this surprisingly simple contrastive loss trick.

Abstract

ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.

Natural Language Processing Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References17

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Contrastive Regularization for Accent-Robust ASR

Related Papers