CUHKWestlakeMay 25, 2026arXiv:2605.25759

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

AI Summary

The paper introduces Alignment via Synthetic Anatomical Preference (ASAP), a framework for improving anatomical correctness in text-to-image human image generation. ASAP constructs a Human Anatomical Preference (HAP) dataset of over 10K preference pairs by applying localized anatomical degradations to high-fidelity human images, enabling controlled preference-based alignment. They further propose a localized and margin-bounded DPO variant that focuses optimization on targeted anatomical regions, achieving improved anatomical fidelity without sacrificing overall image quality, as validated by their new HAF-Bench benchmark.

Key Contribution

Fix freaky AI-generated hands and other anatomical nightmares with a new preference learning method that surgically corrects anatomical errors in diffusion models.

Abstract

Large-scale text-to-image foundation models have achieved remarkable visual realism, yet generating human images with correct anatomical structures remains challenging. Existing approaches enforce anatomical constraints through part-specific modules or localized loss weighting during supervised fine-tuning on high-quality human photos, but such datasets are limited and often provide ambiguous optimization signals due to confounding factors such as lighting, pose, and background. Preference-based alignment offers an alternative, but standard Direct Preference Optimization (DPO) treats all pixels equally and therefore fails to exploit the localized nature of anatomical artifacts. To address this, we propose the framework of Alignment via Synthetic Anatomical Preference (ASAP), which constructs controlled preference pairs through a localized degradation mechanism applied to high-fidelity human images. This mechanism performs a controlled experiment on images by introducing explicit anatomical errors in targeted regions while preserving the remaining content. With this mechanism, we create the Human Anatomical Preference (HAP) dataset with over 10K curated pairs for effective anatomical alignment of text-to-image human image generative models. To better leverage the locality of these controlled preference pairs, we introduce a localized and margin-bounded variant of DPO that prioritizes optimization in targeted anatomical regions while enforcing a finite preference margin to prevent over-optimization and preserve global semantics. We further introduce HAF-Bench, a benchmark for systematic evaluation of anatomical fidelity. Extensive experiments demonstrate that ASAP consistently reduces anatomical errors across multiple foundation models while maintaining overall image quality.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

Related Papers