Search papers, labs, and topics across Lattice.
This paper addresses the challenge of merging heterogeneous language models (LMs) in federated hybrid automatic speech recognition (ASR) systems, where both n-gram and neural network LMs are present. They introduce a "match-and-merge" paradigm with two algorithms: Genetic Match-and-Merge Algorithm (GMMA) and Reinforced Match-and-Merge Algorithm (RMMA). RMMA, which uses reinforcement learning for pairing and merging LMs, achieves the lowest average Character Error Rate across seven OpenSLR datasets and converges significantly faster than GMMA, demonstrating its effectiveness for scalable, privacy-preserving ASR.
Reinforcement learning can efficiently merge heterogeneous language models in federated ASR, outperforming genetic algorithms and improving character error rate.
Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and better generalization than baselines, converging up to seven times faster than GMMA, highlighting the paradigm's potential for scalable, privacy-preserving ASR systems.