Apr 22, 2026arXiv:2604.20270

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

AI Summary

This paper investigates embedding-based intrusive metrics for musical source separation (MSS) using MERT representations, addressing the limitations of traditional BSS-Eval metrics. They compute MSE and an intrusive FAD variant on MERT embeddings and compare their correlation with perceptual audio quality ratings. Results on two datasets demonstrate that these embedding-based metrics exhibit a stronger correlation with perceptual quality than BSS-Eval across various stem and model types.

Key Contribution

Ditch your old MSS evaluation metrics: MERT-based embeddings correlate far better with human perception.

Abstract

Evaluation of musical source separation (MSS) has traditionally relied on Blind Source Separation Evaluation (BSS-Eval) metrics. However, recent work suggests that BSS-Eval metrics exhibit low correlation between metrics and perceptual audio quality ratings from a listening test, which is considered the gold standard evaluation method. As an alternative approach in singing voice separation, embedding-based intrusive metrics that leverage latent representations from large self-supervised audio models such as Music undERstanding with large-scale self-supervised Training (MERT) embeddings have been introduced. In this work, we analyze the correlation of perceptual audio quality ratings with two intrusive embedding-based metrics: a mean squared error (MSE) and an intrusive variant of the Fr\'echet Audio Distance (FAD) calculated on MERT embeddings. Experiments on two independent datasets show that these metrics correlate more strongly with perceptual audio quality ratings than traditional BSS-Eval metrics across all analyzed stem and model types.

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

Related Papers