Search papers, labs, and topics across Lattice.
The paper introduces SpatialID, a training-free framework for personalized text-to-image generation that addresses the issue of identity feature contamination in non-facial regions caused by spatially uniform visual injection. SpatialID decouples identity injection into face-relevant and context-free regions using a Spatial Mask Extractor derived from cross-attention responses and employs a Temporal-Spatial Scheduling strategy to dynamically adjust spatial constraints during diffusion. Experiments on IBench demonstrate that SpatialID achieves state-of-the-art performance in text adherence, visual consistency, and image quality by eliminating background contamination and preserving identity.
Stop identity features from bleeding into your backgrounds: SpatialID uses attention-based masks to inject personalized visual features only where they matter, without any training.
Personalized text-to-image generation aims to integrate specific identities into arbitrary contexts. However, existing tuning-free methods typically employ Spatially Uniform Visual Injection, causing identity features to contaminate non-facial regions (e.g., backgrounds and lighting) and degrading text adherence. To address this without expensive fine-tuning, we propose SpatialID, a training-free spatially-adaptive identity modulation framework. SpatialID fundamentally decouples identity injection into face-relevant and context-free regions using a Spatial Mask Extractor derived from cross-attention responses. Furthermore, we introduce a Temporal-Spatial Scheduling strategy that dynamically adjusts spatial constraints - transitioning from Gaussian priors to attention-based masks and adaptive relaxation - to align with the diffusion generation dynamics. Extensive experiments on IBench demonstrate that SpatialID achieves state-of-the-art performance in text adherence (CLIP-T: 0.281), visual consistency (CLIP-I: 0.827), and image quality (IQ: 0.523), significantly eliminating background contamination while maintaining robust identity preservation.