Central Conservatory of MusicFudanSUSTechJun 15, 2026arXiv:2606.16612

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

Yan Han, Zhibin Wen, Yuan Wang, Shuangrun Shao, Xiaobing Li, Yang Xu, Wei Li

AI Summary

This paper introduces Sofia, a novel Synthetic Song Detection (SSD) framework that leverages music-intrinsic features through a flexible Mixture-of-Experts (MoE) approach. By employing feature-specific experts and combining various music attributes, Sofia achieves generator-agnostic representations that enhance detection capabilities. The framework's effectiveness is validated on the newly constructed MUSIC8K benchmark, where it outperforms the strongest baseline by 18.5 points in F1 score while demonstrating robust performance against realistic audio perturbations.

Key Contribution

Sofia's innovative use of music-intrinsic features enables a dramatic 18.5-point improvement in synthetic song detection accuracy, setting a new standard in the field.

Abstract

The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

Related Papers