Mar 18, 2026arXiv:2603.17876

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

AI Summary

The paper introduces EditSpilloverProbe, a framework to evaluate world knowledge in image editing models by analyzing how they alter semantically related content outside the specified edit region. They create a taxonomy of spillover types (spatial, semantic, mixed, random) and a benchmark dataset, EditSpilloverBench, using real-world Chinese text editing tasks. Experiments on five models reveal varying spillover rates and a trade-off between editing control and semantic spillover, with semantic spillover demonstrating genuine world understanding rather than spatial diffusion.

Key Contribution

Image editing models leak fascinating hints about their world knowledge through "edit spillover"—unintended changes to semantically related regions—and this paper turns that leakage into a probe.

Abstract

Instruction-following image editing models are expected to modify only the specified region while keeping the rest of the image unchanged. However, in practice, we observe a pervasive phenomenon -- edit spillover: models alter semantically related but unspecified content outside the edit region. This raises a fundamental question -- does spillover reflect genuine implicit world understanding, or is it merely attention leakage? We propose EditSpilloverProbe, a systematic framework that repurposes edit spillover as a natural probe for world knowledge in image editing models. We introduce a spillover taxonomy (spatial, semantic, mixed, random), an automated detection-and-classification pipeline, and a benchmark dataset constructed from real-world Chinese text editing tasks, EditSpilloverBench. Systematic evaluation of 5 representative editing models reveals three core findings: (1) spillover rates vary dramatically across architectures, from 3.49% to 11.46%, with a 3.3x ratio; (2) absolute semantic spillover quantity reveals models' world understanding capability -- nano_banana produces the most semantic spillover (27.8 per image), while qwen_2511 has the most precise editing control but lower semantic spillover (16.3 per image), revealing a trade-off between editing control and world understanding; (3) spatial decay analysis shows spillover area density decays exponentially with distance, but the proportion of semantically relevant spillover remains constant (40%-58%), providing direct evidence that semantic spillover reflects genuine world understanding rather than spatial diffusion.

Computer Vision Interpretability & Mechanistic Interp Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Related Papers