Search papers, labs, and topics across Lattice.
This paper introduces APEX, a multi-task learning framework trained on a large dataset of AI-generated music from Suno and Udio (211k songs, 10k hours) to predict popularity (streams and likes) and aesthetic quality dimensions. The model leverages frozen MERT embeddings to capture perceptual aesthetics. Results show that incorporating aesthetic features significantly improves out-of-distribution generalization in predicting human preferences across diverse generative music systems, demonstrating the value of aesthetic quality in popularity prediction.
Aesthetic quality unlocks better generalization in AI-generated music popularity prediction, beating models trained solely on engagement metrics.
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.