Apr 6, 2026arXiv:2604.04841

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Xuan-Bo Chen, Xuanjun Chen, Chia-Yu Hu, Chia-Yu Hu, Sung-Feng Huang, Haibin Wu, Hung-yi Lee, Hung-yi Lee, Jyh-Shing Roger Jang, Jyh-Shing Roger Jang

AI Summary

This paper investigates the limitations of conventional 16 kHz-sampled detectors for Singing Voice Deepfake (SingFake) Detection (SVDD) and proposes a novel joint fullband-subband modeling framework using high-resolution (44.1 kHz) audio. The framework leverages a fullband model to capture global context and subband-specific experts to isolate fine-grained synthesis artifacts across different frequency ranges. Experiments on the WildSVDD dataset demonstrate that the proposed framework significantly outperforms 16 kHz-sampled models, highlighting the importance of high-resolution audio and strategic subband integration for robust SVDD.

Key Contribution

High-frequency details, often discarded, are actually crucial for spotting singing voice deepfakes, enabling significantly better detection.

Abstract

Rapid advances in singing voice synthesis have increased unauthorized imitation risks, creating an urgent need for better Singing Voice Deepfake (SingFake) Detection, also known as SVDD. Unlike speech, singing contains complex pitch, wide dynamic range, and timbral variations. Conventional 16 kHz-sampled detectors prove inadequate, as they discard vital high-frequency information. This study presents the first systematic analysis of high-resolution (44.1 kHz sampling rate) audio for SVDD. We propose a joint fullband-subband modeling framework: the fullband captures global context, while subband-specific experts isolate fine-grained synthesis artifacts unevenly distributed across the spectrum. Experiments on the WildSVDD dataset demonstrate that high-frequency subbands provide essential complementary cues. Our framework significantly outperforms 16 kHz-sampled models, proving that high-resolution audio and strategic subband integration are critical for robust in-the-wild detection.

Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Related Papers