Mar 2, 2026arXiv:2603.01482

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Hashim Ali, Hashim Ali, Nithin Sai Adupa, N. Adupa, Surya Subramani, Surya Subramani, Hafiz Malik, H. Malik

AI Summary

The authors introduce Spoof-SUPERB, a benchmark for evaluating self-supervised learning (SSL) models on audio deepfake detection across diverse architectures (generative, discriminative, spectrogram-based). They systematically evaluated 20 SSL models on in-domain and out-of-domain datasets, finding that large-scale discriminative models like XLS-R, UniSpeech-SAT, and WavLM Large exhibit superior performance and robustness. The benchmark provides a reproducible baseline and insights into SSL representation reliability for securing speech systems against audio deepfakes.

Key Contribution

Securing speech systems against deepfakes requires large-scale discriminative SSL models, as shown by the Spoof-SUPERB benchmark, which reveals their superior performance and robustness compared to generative approaches.

Abstract

Self-supervised learning (SSL) has transformed speech processing, with benchmarks such as SUPERB establishing fair comparisons across diverse downstream tasks. Despite it's security-critical importance, Audio deepfake detection has remained outside these efforts. In this work, we introduce Spoof-SUPERB, a benchmark for audio deepfake detection that systematically evaluates 20 SSL models spanning generative, discriminative, and spectrogram-based architectures. We evaluated these models on multiple in-domain and out-of-domain datasets. Our results reveal that large-scale discriminative models such as XLS-R, UniSpeech-SAT, and WavLM Large consistently outperform other models, benefiting from multilingual pretraining, speaker-aware objectives, and model scale. We further analyze the robustness of these models under acoustic degradations, showing that generative approaches degrade sharply, while discriminative models remain resilient. This benchmark establishes a reproducible baseline and provides practical insights into which SSL representations are most reliable for securing speech systems against audio deepfakes.

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Related Papers