Mar 30, 2026arXiv:2603.28644

Constructing Composite Features for Interpretable Music-Tagging

Chenhao Xue, Wei Hu, Weitao Hu, Joyraj Chakraborty, Zhijin Guo, Kang Li, Tianyu Shi, Martin Reed, Nikolaos Thomos, N. Thomos

AI Summary

This paper introduces a Genetic Programming (GP) pipeline to automatically construct composite audio features for music tagging by mathematically combining base features. The goal is to improve performance while maintaining interpretability, addressing the limitations of deep learning-based feature fusion. Experiments on MTG-Jamendo and GTZAN datasets show consistent improvements over state-of-the-art systems, with effective feature combinations found within modest GP evaluation budgets.

Key Contribution

Evolving interpretable composite features via Genetic Programming beats black-box deep learning at music tagging, revealing synergistic interactions and transformations that boost performance.

Abstract

Combining multiple audio features can improve the performance of music tagging, but common deep learning-based feature fusion methods often lack interpretability. To address this problem, we propose a Genetic Programming (GP) pipeline that automatically evolves composite features by mathematically combining base music features, thereby capturing synergistic interactions while preserving interpretability. This approach provides representational benefits similar to deep feature fusion without sacrificing interpretability. Experiments on the MTG-Jamendo and GTZAN datasets demonstrate consistent improvements compared to state-of-the-art systems across base feature sets at different abstraction levels. It should be noted that most of the performance gains are noticed within the first few hundred GP evaluations, indicating that effective feature combinations can be identified under modest search budgets. The top evolved expressions include linear, nonlinear, and conditional forms, with various low-complexity solutions at top performance aligned with parsimony pressure to prefer simpler expressions. Analyzing these composite features further reveals which interactions and transformations tend to be beneficial for tagging, offering insights that remain opaque in black-box deep models.

Interpretability & Mechanistic Interp Speech & Audio

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Constructing Composite Features for Interpretable Music-Tagging

Related Papers