StuttgartFeb 18, 2026arXiv:2602.16343

How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

Yixuan Xiao, Florian Lux, Florian Lux, Alejandro Pérez-González-de-Martos, Alejandro P'erez-Gonz'alez-de-Martos, Ngoc Thang Vu, Ngoc Thang Vu

AI Summary

This paper investigates the impact of labeling codec-resynthesized audio as either bonafide or spoof in audio deepfake detection, given the dual nature of neural audio codecs for both compression and speech synthesis. They construct a challenging extension of the ASVspoof 5 dataset to analyze the effects of different labeling strategies on detection performance. The study provides insights into how labeling choices influence the effectiveness of audio deepfake detection systems.

Key Contribution

Conflicting labels on codec-resynthesized audio can significantly impact audio deepfake detection, highlighting a critical challenge in dataset creation and model training.

Abstract

Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies.

Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

Related Papers