Search papers, labs, and topics across Lattice.
The paper investigates the impact of speaker identity on speech spoofing detection systems, challenging the common assumption that embeddings are speaker-independent. They propose a Speaker-Invariant Multi-Task (SInMT) framework that either models or removes speaker identity using multi-task learning with a gradient reversal layer. Experiments on four datasets demonstrate that the speaker-invariant model significantly reduces the average equal error rate by 17% and up to 48% for specific attacks, highlighting the importance of addressing speaker variability.
Speaker identity significantly impacts spoofing detection, and surprisingly, removing speaker-specific information from embeddings can dramatically improve performance, especially against sophisticated attacks.
Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).