Mar 11, 2026arXiv:2603.10827

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Thomas Thebaud, Yuzhe Wang, L. Moro-Velázquez, Jesus Villalba-Lopez, N. Dehak

AI Summary

This paper investigates the speaker verification capabilities of speech-aware LLMs, finding that they exhibit poor performance on this task out-of-the-box. To address this, the authors introduce ECAPA-LLM, a novel augmentation strategy that injects frozen ECAPA-TDNN speaker embeddings into an LLM via a learned projection and LoRA fine-tuning. The resulting model achieves a 1.03% EER on VoxCeleb1-E, demonstrating a significant improvement in speaker verification while maintaining natural language capabilities.

Key Contribution

Speech-aware LLMs are surprisingly bad at speaker verification, but a simple embedding injection trick closes the gap with dedicated systems while preserving the LLM's language abilities.

Abstract

Speech-aware large language models (LLMs) can accept speech inputs, yet their training objectives largely emphasize linguistic content or specific fields such as emotions or the speaker's gender, leaving it unclear whether they encode speaker identity. First, we propose a model-agnostic scoring protocol that produces continuous verification scores for both API-only and open-weight models, using confidence scores or log-likelihood ratios from the Yes/No token probabilities. Using this protocol, we benchmark recent speech-aware LLMs and observe weak speaker discrimination (EERs above 20% on VoxCeleb1). Second, we introduce a lightweight augmentation that equips an LLM with ASV capability by injecting frozen ECAPA-TDNN speaker embeddings through a learned projection and training only LoRA adapters. On TinyLLaMA-1.1B, the resulting ECAPA-LLM achieves 1.03% EER on VoxCeleb1-E, approaching a dedicated speaker verification system while preserving a natural-language interface.

Eval Frameworks & Benchmarks Open-Source Models & Weights Speech & Audio

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Related Papers