Tsinghua AICityUTelecomApr 6, 2026arXiv:2604.04911

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

Yicheng Xiao, Wenhu Zhang, Lin Song, Tianhe Ren, Haokun Lin, Haoyang Huang, Xiu Li

AI Summary

The paper introduces SpatialEdit-Bench, a benchmark for evaluating fine-grained image spatial editing, which jointly measures perceptual plausibility and geometric fidelity using viewpoint reconstruction and framing analysis. To facilitate training, they create SpatialEdit-500k, a large-scale synthetic dataset with ground-truth transformations generated using a controllable Blender pipeline. Finally, they develop SpatialEdit-16B, a baseline model trained on this data, demonstrating superior performance on spatial manipulation tasks compared to existing methods.

Key Contribution

Existing image editing models fall short when it comes to precise spatial manipulations, but a new benchmark and dataset reveal the path to closing the gap.

Abstract

Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

Related Papers