Search papers, labs, and topics across Lattice.
The paper introduces the All-Type Audio Deepfake Detection (AT-ADD) Grand Challenge to address the limitations of current audio deepfake detection methods, which are primarily speech-focused and lack robustness to real-world conditions and generalization across diverse audio types. AT-ADD comprises two tracks: robust speech deepfake detection and all-type audio deepfake detection, using standardized datasets and evaluation protocols. This challenge aims to foster the development of more robust and generalizable audio forensic technologies capable of handling diverse audio types and spoofing techniques.
Existing audio deepfake detectors are sitting ducks outside the lab: AT-ADD introduces a challenge to push research towards real-world robustness and generalization across all audio types.
The rapid advancement of Audio Large Language Models (ALLMs) has enabled cost-effective, high-fidelity generation and manipulation of both speech and non-speech audio, including sound effects, singing voices, and music. While these capabilities foster creativity and content production, they also introduce significant security and trust challenges, as realistic audio deepfakes can now be generated and disseminated at scale. Existing audio deepfake detection (ADD) countermeasures (CMs) and benchmarks, however, remain largely speech-centric, often relying on speech-specific artifacts and exhibiting limited robustness to real-world distortions, as well as restricted generalization to heterogeneous audio types and emerging spoofing techniques. To address these gaps, we propose the All-Type Audio Deepfake Detection (AT-ADD) Grand Challenge for ACM Multimedia 2026, designed to bridge controlled academic evaluation with practical multimedia forensics. AT-ADD comprises two tracks: (1) Robust Speech Deepfake Detection, which evaluates detectors under real-world scenarios and against unseen, state-of-the-art speech generation methods; and (2) All-Type Audio Deepfake Detection, which extends detection beyond speech to diverse, unknown audio types and promotes type-agnostic generalization across speech, sound, singing, and music. By providing standardized datasets, rigorous evaluation protocols, and reproducible baselines, AT-ADD aims to accelerate the development of robust and generalizable audio forensic technologies, supporting secure communication, reliable media verification, and responsible governance in an era of pervasive synthetic audio.