Jun 4, 2026arXiv:2606.06065

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

AI Summary

This study investigates the effectiveness of multi-task learning (MTL) in dual-output second language (L2) speech recognition, specifically comparing Korean and English. The authors find that while MTL enhances meaning recognition, it significantly impairs surface transcription accuracy in English, particularly as the divergence between surface and meaning increases. Encoder analysis reveals that Korean maintains distinct task representations, whereas English suffers from representational entanglement, leading to degraded performance in transcription tasks.

Key Contribution

MTL may boost meaning recognition in L2 speech tasks, but it can severely compromise transcription accuracy, especially in English.

Abstract

Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance.Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

Related Papers