Mar 16, 2026arXiv:2603.15261

Two-Stage Adaptation for Non-Normative Speech Recognition: Revisiting Speaker-Independent Initialization for Personalization

Shan Jiang, Jiawen Qi, Chuanbing Huo, Yingqiang Gao, Qinyu Chen

AI Summary

This paper investigates whether speaker-independent fine-tuning (SI-FT) on multi-speaker non-normative speech data, prior to speaker-specific fine-tuning (SS-FT), improves ASR personalization for atypical speech. The authors propose a two-stage adaptation framework (SI-FT followed by SS-FT) and compare it against direct SS-FT using Whisper-Large-v3 and Qwen3-ASR models. Results on AphasiaBank and UA-Speech datasets demonstrate that the two-stage approach consistently enhances personalization performance while balancing out-of-domain performance on typical speech datasets.

Key Contribution

Pre-training on diverse atypical speech unlocks better personalization of ASR systems for individuals with speech impairments, outperforming direct fine-tuning on individual speakers.

Abstract

Personalizing automatic speech recognition (ASR) systems for non-normative speech, such as dysarthric and aphasic speech, is challenging. While speaker-specific fine-tuning (SS-FT) is widely used, it is typically initialized directly from a generic pre-trained model. Whether speaker-independent adaptation provides a stronger initialization prior under such mismatch remains unclear. In this work, we propose a two-stage adaptation framework consisting of speaker-independent fine-tuning (SI-FT) on multi-speaker non-normative data followed by SS-FT, and evaluate it through a controlled comparison with direct SS-FT under identical per-speaker conditions. Experiments on AphasiaBank and UA-Speech with Whisper-Large-v3 and Qwen3-ASR, alongside evaluation on typical-speech datasets TED-LIUM v3 and FLEURS, show that two-stage adaptation consistently improves personalization while maintaining manageable out-of-domain (OOD) trade-offs.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Two-Stage Adaptation for Non-Normative Speech Recognition: Revisiting Speaker-Independent Initialization for Personalization

Related Papers