Search papers, labs, and topics across Lattice.
This paper introduces Sofia, a novel Synthetic Song Detection (SSD) framework that leverages music-intrinsic features through a flexible Mixture-of-Experts (MoE) architecture. By utilizing feature-specific experts for Vocal, Audio-effect, and Global structure attributes, Sofia captures generator-agnostic cues that existing methods often overlook. The framework's effectiveness is validated on the newly created MUSIC8K benchmark, achieving an 18.5-point improvement in F1 score compared to the best baseline while demonstrating robust performance against realistic audio perturbations.
Sofia's innovative use of music-intrinsic features enables a significant leap in synthetic song detection accuracy, outperforming traditional methods by a remarkable margin.
The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.