Apr 8, 2026arXiv:2604.07427

Personalizing Text-to-Image Generation to Individual Taste

A. Maerten, Anne-Sofie Maerten, Juliane Verwiebe, Shyamgopal Karthik, Ameya Prabhu, Johan Wagemans, Matthias Bethge

AI Summary

This paper introduces PAMELA, a new dataset of 70,000 personalized image ratings across 5,000 images generated by state-of-the-art text-to-image models, with each image rated by 15 unique users. They train a personalized reward model on PAMELA and existing aesthetic datasets, demonstrating improved accuracy in predicting individual image preferences compared to models trained for population-level preferences. Finally, they show that this personalized reward model can be used to steer text-to-image generation towards individual user tastes via prompt optimization.

Key Contribution

Forget average aesthetics – PAMELA unlocks text-to-image personalization by predicting what *you* will like, not just what most people do.

Abstract

Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models optimize for"average"human appeal, they fail to capture the inherent subjectivity of aesthetic judgment. In this work, we introduce a novel dataset and predictive framework, called PAMELA, designed to model personalized image evaluations. Our dataset comprises 70,000 ratings across 5,000 diverse images generated by state-of-the-art models (Flux 2 and Nano Banana). Each image is evaluated by 15 unique users, providing a rich distribution of subjective preferences across domains such as art, design, fashion, and cinematic photography. Leveraging this data, we propose a personalized reward model trained jointly on our high-quality annotations and existing aesthetic assessment subsets. We demonstrate that our model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences. Using our personalized predictor, we demonstrate how simple prompt optimization methods can be used to steer generations towards individual user preferences. Our results highlight the importance of data quality and personalization to handle the subjectivity of user preferences. We release our dataset and model to facilitate standardized research in personalized T2I alignment and subjective visual quality assessment.

Computer Vision Multimodal Models RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Personalizing Text-to-Image Generation to Individual Taste

Related Papers