PKUFeb 27, 2026arXiv:2602.24233

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Zhenyu Tang, Chaoran Feng, Yufan Deng, Jie Wu, Xiaojie Li, Yunpeng Chen, Daquan Zhou

AI Summary

This paper introduces a reward modeling approach to improve spatial understanding in text-to-image generation. They construct a SpatialReward-Dataset of 80k preference pairs and train a reward model, SpatialScore, to evaluate the accuracy of spatial relationships. Online reinforcement learning using SpatialScore significantly improves spatial understanding in generated images, outperforming existing models on spatial relationship benchmarks.

Key Contribution

A reward model trained on spatial relationship preferences beats proprietary models at spatial understanding in text-to-image generation, and unlocks better RL-based image generation.

Abstract

Recent progress in text-to-image generation has greatly advanced visual fidelity and creativity, but it has also imposed higher demands on prompt complexity-particularly in encoding intricate spatial relationships. In such cases, achieving satisfactory results often requires multiple sampling attempts. To address this challenge, we introduce a novel method that strengthens the spatial understanding of current image generation models. We first construct the SpatialReward-Dataset with over 80k preference pairs. Building on this dataset, we build SpatialScore, a reward model designed to evaluate the accuracy of spatial relationships in text-to-image generation, achieving performance that even surpasses leading proprietary models on spatial evaluation. We further demonstrate that this reward model effectively enables online reinforcement learning for the complex spatial generation. Extensive experiments across multiple benchmarks show that our specialized reward model yields significant and consistent gains in spatial understanding for image generation.

Computer Vision Multimodal Models RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Related Papers