HuaweiApr 21, 2026arXiv:2604.19406

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

Fan Li, Chong Wang, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, Wangmeng Zuo

AI Summary

This paper introduces HP-Edit, a post-training framework for aligning image editing diffusion models with human preferences using Reinforcement Learning from Human Feedback (RLHF). They create RealPref-50K, a dataset of 50,000 real-world image editing preferences, and train HP-Scorer, an automatic evaluator using a VLM, to serve as a reward function for post-training. Experiments show HP-Edit significantly improves alignment with human preferences in models like Qwen-Image-Edit-2509.

Key Contribution

Forget expensive human feedback loops: a VLM-powered reward function can efficiently align image editing diffusion models with human preferences.

Abstract

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.

Computer Vision Multimodal Models RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References55

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

Related Papers