Central Conservatory of MusicFudanSUSTechJun 15, 2026arXiv:2606.16612

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

Yan Han, Zhibin Wen, Yuan Wang, Shuangrun Shao, Xiaobing Li, Yang Xu, Wei Li

AI Summary

This paper introduces Sofia, a novel Synthetic Song Detection (SSD) framework that leverages music-intrinsic features through a flexible Mixture-of-Experts (MoE) architecture. By utilizing feature-specific experts for Vocal, Audio-effect, and Global structure attributes, Sofia captures generator-agnostic cues that existing methods often overlook. The framework's effectiveness is validated on the newly created MUSIC8K benchmark, achieving an 18.5-point improvement in F1 score compared to the best baseline while demonstrating robust performance against realistic audio perturbations.

Key Contribution

Sofia's innovative use of music-intrinsic features enables a significant leap in synthetic song detection accuracy, outperforming traditional methods by a remarkable margin.

Abstract

The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

Related Papers