Search papers, labs, and topics across Lattice.
This paper introduces AMRS, an affective music recommendation system deployed on health-and-wellness platforms, designed to optimize listener's affective state under ethical constraints that limit online experimentation. The system uses a causal transformer trained on logged listening data to predict engagement, ratings, valence, and arousal, serving as a world model for offline policy training and stress-testing. Direct Preference Optimization (DPO) is then used to fine-tune a behavior-cloned recommender policy offline, improving predicted valence and arousal without sacrificing diversity.
Offline policy optimization with a world model allows for affective music recommendation that improves user valence and arousal, even when ethical constraints preclude online experimentation.
Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethically constrained, particularly for clinical populations who cannot reliably skip a song or report distress. We describe AMRS, the Affective Music Recommendation System deployed on LUCID's health-and-wellness platforms, which serve clinical users (primarily older adults with neurocognitive conditions) and consumer-wellness users across energize, focus, calm, and sleep modes. AMRS is built around a rollout-based world model: a causal transformer trained on logged listening data to jointly predict engagement, binary rating, and self-reported valence and arousal. The world model serves both as an in-silico simulator for offline policy training and as a stress-testing tool before deployment. A recommender policy initialized by behaviour cloning is fine-tuned offline with Direct Preference Optimization (DPO) against a configurable multi-objective utility function. Under a strict cold-start protocol, the world model predicts both behavioural and affective signals with usable fidelity; DPO improves predicted valence and arousal over the cloned baseline while maintaining a similar diversity profile and avoiding the distributional collapse produced by greedy optimization. We position the work as an early deployed validation of a methodology for affective recommendation when online experimentation is ethically untenable.