SamsungSookmyung Women's UniversitySookmyung Women’s UniversityYonseiJun 17, 2026arXiv:2606.18906

BindEdit: Taming Attention Leakage for Precise Multi-Object Image Editing

Chaewon Park, Soyoon Lee, Naeun Lee, Minjung Shin, Seogkyu Jeon, Kibeom Hong

AI Summary

This paper introduces BindEdit, a novel approach to address attention leakage in multi-object image editing, which often results in semantic blending and incomplete edits. By identifying and tackling two specific forms of leakage—Edit-Token Leakage and Source Dominance Leakage—BindEdit enforces attention-level constraints that improve the precision of edits across complex scenes. Experimental results show that BindEdit significantly outperforms existing methods, achieving robust performance in both single- and multi-object editing tasks.

Key Contribution

Attention leakage in image editing can lead to significant errors, but BindEdit effectively resolves this issue, enhancing precision in complex multi-object scenarios.

Abstract

Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial regions and text tokens become entangled during the denoising process. Specifically, we identify two distinct forms of leakage: Edit-Token Leakage, where ambiguous token-region alignment leads to object blending, and Source Dominance Leakage, where tokens of unchanged source objects overwhelm the attention intended for target entities. To resolve these leakages, we propose \textbf{BindEdit}, which enforces attention-level constraints within a single diffusion trajectory. To suppress Edit-Token Leakage, BindEdit jointly regularizes cross- and self-attention so that each target token group is bound to its corresponding spatial region while maintaining instance-level separation. To suppress Source Dominance Leakage, a cross-attention re-balancing mechanism amplifies target token influence and attenuates residual source semantics within editable regions. Moreover, a region fidelity term ensures that each target concept is expressed coherently across the entire editing mask. Additionally, we propose a comprehensive multi-object benchmark encompassing diverse object counts and categories. Extensive experiments demonstrate that BindEdit consistently outperforms existing methods within a single diffusion trajectory, maintaining robust performance across both single- and multi-object editing scenarios.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BindEdit: Taming Attention Leakage for Precise Multi-Object Image Editing

Related Papers