BeihangJilinSUSTechTencent AIApr 29, 2026arXiv:2604.26465

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Bo Cheng, Songjun Cao, Xiaoming Zhang, Jie Chen, Fei Chen

AI Summary

This paper tackles the challenge of generalizing audio deepfake detection (ADD) to unseen attacks by focusing on hard sample classification. They generate hard samples using a diffusion-based reconstruction method, finding it superior to other reconstruction paradigms. The approach further incorporates multi-layer feature aggregation and a Regularization-Assisted Contrastive Learning (RACL) objective to improve generalizability, achieving state-of-the-art results.

Key Contribution

Audio deepfake detectors trained on diffusion-reconstructed "hard" examples generalize far better to unseen attacks, slashing error rates compared to standard training.

Abstract

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framework centered on hard sample classification. The core idea is that a model capable of distinguishing challenging hard samples is inherently equipped to handle simpler cases effectively. We investigate multiple reconstruction paradigms, identifying the diffusion-based method as optimal for generating hard samples. Furthermore, we leverage multi-layer feature aggregation and introduce a Regularization-Assisted Contrastive Learning (RACL) objective to enhance generalizability. Experiments demonstrate the superior generalization of our approach, with our best model achieving a significant reduction in the average Equal Error Rate (EER) compared to the baseline.

Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Related Papers