Search papers, labs, and topics across Lattice.
This paper introduces a training-free image editing framework that combines Optimised Diffusion with Denoising Score (DDS) with LoRA-driven concept composition to enable multi-concept manipulation. The method addresses the limitations of text-based editing by incorporating visual concepts learned by LoRA adapters trained on specific visual attributes. By refining DDS with ordered timesteps, regularization, and negative-prompt guidance, the framework achieves improved stability and controllability in editing.
Forget finetuning: this method lets you inject visual concepts like "rough skin" or "metallic texture" into diffusion-based image edits, all without any training.
Editing images with diffusion models without training remains challenging. While recent optimisation-based methods achieve strong zero-shot edits from text, they struggle to preserve identity or capture details that language alone cannot express. Many visual concepts such as facial structure, material texture, or object geometry are impossible to express purely through text prompts alone. To address this gap, we introduce a training-free framework for concept-based image editing, which unifies Optimised DDS with LoRA-driven concept composition, where the training data of the LoRA represent the concept. Our approach enables combining and controlling multiple visual concepts directly within the diffusion process, integrating semantic guidance from text with low-level cues from pretrained concept adapters. We further refine DDS for stability and controllability through ordered timesteps, regularisation, and negative-prompt guidance. Quantitative and qualitative results demonstrate consistent improvements over existing training-free diffusion editing methods on InstructPix2Pix and ComposLoRA benchmarks. Code will be made publicly available.