Mar 12, 2026arXiv:2603.11556

Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

Xinyu Nan, Ning Wang, Yuyao Zhai, Mei Yang

AI Summary

This paper introduces Dual-supervised Image Aesthetic Enhancement (DIAE), a diffusion model that enhances image aesthetics by incorporating Multimodal Aesthetic Perception (MAP) to convert ambiguous aesthetic instructions into explicit guidance. To address the scarcity of perfectly paired images, they collected an "imperfectly-paired" dataset (IIAEData) and used a dual-branch supervision framework to leverage the weak matching characteristics during training. Experiments show DIAE outperforms baselines in image aesthetic scores and content consistency.

Key Contribution

Diffusion models can now follow ambiguous aesthetic instructions with high fidelity by conditioning on multimodal aesthetic perception and training on imperfectly paired datasets.

Abstract

Image aesthetic enhancement aims to perceive aesthetic deficiencies in images and perform corresponding editing operations, which is highly challenging and requires the model to possess creativity and aesthetic perception capabilities. Although recent advancements in image editing models have significantly enhanced their controllability and flexibility, they struggle with enhancing image aesthetic. The primary challenges are twofold: first, following editing instructions with aesthetic perception is difficult, and second, there is a scarcity of"perfectly-paired"images that have consistent content but distinct aesthetic qualities. In this paper, we propose Dual-supervised Image Aesthetic Enhancement (DIAE), a diffusion-based generative model with multimodal aesthetic perception. First, DIAE incorporates Multimodal Aesthetic Perception (MAP) to convert the ambiguous aesthetic instruction into explicit guidance by (i) employing detailed, standardized aesthetic instructions across multiple aesthetic attributes, and (ii) utilizing multimodal control signals derived from text-image pairs that maintain consistency within the same aesthetic attribute. Second, to mitigate the lack of"perfectly-paired"images, we collect"imperfectly-paired"dataset called IIAEData, consisting of images with varying aesthetic qualities while sharing identical semantics. To better leverage the weak matching characteristics of IIAEData during training, a dual-branch supervision framework is also introduced for weakly supervised image aesthetic enhancement. Experimental results demonstrate that DIAE outperforms the baselines and obtains superior image aesthetic scores and image content consistency scores.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

Related Papers