Search papers, labs, and topics across Lattice.
This study investigates the effectiveness of multi-task learning (MTL) in dual-output second language (L2) speech recognition, specifically comparing Korean and English. The authors find that while MTL enhances meaning recognition, it significantly impairs surface transcription accuracy in English, particularly as the divergence between surface and meaning increases. Encoder analysis reveals that Korean maintains distinct task representations, whereas English suffers from representational entanglement, leading to degraded performance in transcription tasks.
MTL may boost meaning recognition in L2 speech tasks, but it can severely compromise transcription accuracy, especially in English.
Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance.Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.