Search papers, labs, and topics across Lattice.
The paper addresses language-speaker entanglement in cross-lingual speaker verification, which degrades performance when verifying speakers across different languages. They introduce Dual-LoRA, which uses task-factorized LoRA adapters and a language-anchored adversarial discriminator to disentangle language and speaker information. Experiments on the TidyVoice benchmark show that Dual-LoRA significantly improves performance, achieving a 0.91% validation EER and 3rd place in the challenge.
Adversarial training doesn't have to hurt speaker verification: by explicitly modeling language, you can disentangle speaker and language characteristics without sacrificing speaker discriminability.
Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker across different languages while rejecting those from different speakers sharing the same language. Standard adversarial disentanglement degrades speaker discriminability; blind discriminators inadvertently penalize speaker-discriminative traits that merely correlate with language. To address this, we propose Dual-LoRA, injecting trainable task-factorized LoRA adapters into a frozen pre-trained backbone. Our core innovation is a Language-Anchored Adversary: by grounding the discriminator with an explicit language branch, adversarial gradients target true linguistic cues rather than arbitrary correlations, preserving essential speaker characteristics. Evaluated on the TidyVoice benchmark, our system achieves a 0.91% validation EER and achieves 3rd place in the official challenge.