Apr 29, 2026arXiv:2604.27204

Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

Tobias Bystrich, Julia M. Pritzen, Christoph A. Schmidt, Claudia Wich-Reif

AI Summary

This paper introduces Selective Augmentation, a bootstrapping method to enhance universal automatic phonetic transcription (APT) by selectively transferring phonetic distinctions from a helper language to improve training data. Using MultIPA, they augment German phonetic transcriptions with Hindi data to improve plosive voicing and introduce plosive aspiration. Results show a 17.6% increase in voicing accuracy and the successful introduction of aspiration recognition, leading to a 32.2% reduction in tenuis class conflations.

Key Contribution

Transferring phonetic knowledge from one language to another can dramatically improve automatic phonetic transcription, even enabling the recognition of entirely new phonetic features.

Abstract

In the field of universal automatic phonetic transcription (APT), clean and diverse training transcriptions are required. However, such high-quality data is limited. We propose the bootstrapping approach Selective Augmentation to improve the available training transcriptions by selectively transferring distinctions between languages. Based on the model MultIPA, we exemplarily show that we could increase the accuracy of an existing feature (plosive voicing) and add a new feature (plosive aspiration) by augmenting the existing training data using information from a separate helper language (Hindi). We describe intrinsic challenges of the evaluation and develop objective metrics to determine the success: Voicing accuracy was increased by 17.6% by reducing the number of false positives. Additionally, aspiration recognition was introduced: While the baseline transcribed 0% of German /p, t, k/ as aspirated, our approach transcribed them as aspirated in 61.2% of the cases. Introducing aspiration recognition to APT models allowed for the tenuis class to be successfully reduced by 32.2%, which also reduces the conflations between the test language's plosives.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

Related Papers