NVIDIANTU TaiwanApr 19, 2026arXiv:2604.17435

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Szu-Chi Chen, I-Ning Tsai, Yi-Cheng Lin, Sung-Feng Huang, Hung-yi Lee

AI Summary

The paper introduces MoVE, a Mixture-of-LoRA-Experts architecture, to address the problem of non-verbal vocalization (NV) stripping in Speech-to-Speech Translation (S2ST) systems. They create a synthesis pipeline for generating expressive datasets and train MoVE with expressive-specialized adapters and a soft-weighting router. Results on English-Chinese S2ST show MoVE reproduces target NVs in 76% of cases, significantly outperforming existing systems and achieving higher human-rated naturalness and emotional fidelity.

Key Contribution

Speech-to-speech translation can now convey laughter and tears with human-like fidelity, thanks to a surprisingly data-efficient approach leveraging LoRA experts.

Abstract

Recent Speech-to-Speech Translation (S2ST) systems achieve strong semantic accuracy yet consistently strip away non-verbal vocalizations (NVs), such as laughter and crying that convey pragmatic intent, which severely limits real-world utility. We address this via three contributions. First, we propose a synthesis pipeline for building scalable expressive datasets to overcome the data scarcity limitation. Second, we propose MoVE, a Mixture-of-LoRA-Experts architecture with expressive-specialized adapters and a soft-weighting router that blends experts for capturing hybrid expressive states. Third, we show pretrained AudioLLMs enable striking data efficiency: 30 minutes of curated data is enough for strong performance. On English-Chinese S2ST, while comparing with strong baselines, MoVE reproduces target NVs in 76% of cases and achieves the highest human-rated naturalness and emotional fidelity among all compared systems, where existing S2ST systems preserve at most 14% of NVs.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Related Papers