CMU MLMar 18, 2026arXiv:2603.17769

Modeling Overlapped Speech with Shuffles

Matthew Wiesner, Samuele Cornell, Alexander Polok, Lucas Ondel Yang, Luk'avs Burget, Lukáš Burget, S. Khudanpur, Sanjeev Khudanpur

AI Summary

This paper introduces a novel approach to modeling overlapped speech using shuffle products and partial order finite-state automata (FSAs) for alignment and speaker-attributed transcription. The method trains on the total score of these FSAs, marginalizing over possible serializations of overlapping sequences and incorporating temporal constraints to reduce graph size. Experiments on synthetic LibriSpeech overlaps demonstrate the algorithm's ability to perform single-pass alignment of multi-talker recordings, a capability not previously achieved.

Key Contribution

Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.

Abstract

We propose to model parallel streams of data, such as overlapped speech, using shuffles. Specifically, this paper shows how the shuffle product and partial order finite-state automata (FSAs) can be used for alignment and speaker-attributed transcription of overlapped speech. We train using the total score on these FSAs as a loss function, marginalizing over all possible serializations of overlapping sequences at subword, word, and phrase levels. To reduce graph size, we impose temporal constraints by constructing partial order FSAs. We address speaker attribution by modeling (token, speaker) tuples directly. Viterbi alignment through the shuffle product FSA directly enables one-pass alignment. We evaluate performance on synthetic LibriSpeech overlaps. To our knowledge, this is the first algorithm that enables single-pass alignment of multi-talker recordings. All algorithms are implemented using k2 / Icefall.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Modeling Overlapped Speech with Shuffles

Related Papers