Search papers, labs, and topics across Lattice.
This paper introduces Sofia, a novel Synthetic Song Detection (SSD) framework that leverages music-intrinsic features through a flexible Mixture-of-Experts (MoE) approach. By employing feature-specific experts and combining various music attributes, Sofia achieves generator-agnostic representations that enhance detection capabilities. The framework's effectiveness is validated on the newly constructed MUSIC8K benchmark, where it outperforms the strongest baseline by 18.5 points in F1 score while demonstrating robust performance against realistic audio perturbations.
Sofia's innovative use of music-intrinsic features enables a dramatic 18.5-point improvement in synthetic song detection accuracy, setting a new standard in the field.
The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.