Search papers, labs, and topics across Lattice.
This paper introduces BindEdit, a novel approach to address attention leakage in multi-object image editing, which often results in semantic blending and incomplete edits. By identifying and tackling two specific forms of leakage鈥擡dit-Token Leakage and Source Dominance Leakage鈥擝indEdit enforces attention-level constraints that improve the precision of edits across complex scenes. Experimental results show that BindEdit significantly outperforms existing methods, achieving robust performance in both single- and multi-object editing tasks.
Attention leakage in image editing can lead to significant errors, but BindEdit effectively resolves this issue, enhancing precision in complex multi-object scenarios.
Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial regions and text tokens become entangled during the denoising process. Specifically, we identify two distinct forms of leakage: Edit-Token Leakage, where ambiguous token-region alignment leads to object blending, and Source Dominance Leakage, where tokens of unchanged source objects overwhelm the attention intended for target entities. To resolve these leakages, we propose \textbf{BindEdit}, which enforces attention-level constraints within a single diffusion trajectory. To suppress Edit-Token Leakage, BindEdit jointly regularizes cross- and self-attention so that each target token group is bound to its corresponding spatial region while maintaining instance-level separation. To suppress Source Dominance Leakage, a cross-attention re-balancing mechanism amplifies target token influence and attenuates residual source semantics within editable regions. Moreover, a region fidelity term ensures that each target concept is expressed coherently across the entire editing mask. Additionally, we propose a comprehensive multi-object benchmark encompassing diverse object counts and categories. Extensive experiments demonstrate that BindEdit consistently outperforms existing methods within a single diffusion trajectory, maintaining robust performance across both single- and multi-object editing scenarios.