Feb 15, 2026arXiv:2602.14237

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

Kunal Swami, Raghu Chittersu, Yuvraj Rathore, Rajeev Irny, Shashavali Doodekula, Alok Shukla

AI Summary

The paper introduces AbracADDbra, a framework for touch-guided object addition that addresses the limitations of text-only or mask-based approaches by using touch priors for spatial grounding of instructions. It employs a decoupled architecture with a vision-language transformer for touch-guided placement and a diffusion model for object generation and mask creation, enabling high-fidelity blending. Experiments on the newly introduced Touch2Add benchmark demonstrate that the placement model outperforms baselines, and the initial placement accuracy strongly correlates with the final edit quality.

Key Contribution

Touch-guided object addition is now significantly more intuitive and accurate thanks to AbracADDbra, which uses a decoupled architecture to outperform existing vision-language models.

Abstract

Instruction-based object addition is often hindered by the ambiguity of text-only prompts or the tedious nature of mask-based inputs. To address this usability gap, we introduce AbracADDbra, a user-friendly framework that leverages intuitive touch priors to spatially ground succinct instructions for precise placement. Our efficient, decoupled architecture uses a vision-language transformer for touch-guided placement, followed by a diffusion model that jointly generates the object and an instance mask for high-fidelity blending. To facilitate standardized evaluation, we contribute the Touch2Add benchmark for this interactive task. Our extensive evaluations, where our placement model significantly outperforms both random placement and general-purpose VLM baselines, confirm the framework's ability to produce high-fidelity edits. Furthermore, our analysis reveals a strong correlation between initial placement accuracy and final edit quality, validating our decoupled approach. This work thus paves the way for more accessible and efficient creative tools.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

Related Papers