Search papers, labs, and topics across Lattice.
This paper introduces a dynamic, training-free LoRA fusion framework for subject and style synthesis in diffusion models. The method dynamically selects between subject and style LoRA weights at each layer based on KL divergence between base model features and LoRA-modified features. It further refines the denoising trajectory using gradient-based corrections guided by CLIP and DINO scores, achieving improved subject-style coherence.
Forget static LoRA fusion – this method dynamically blends subject and style, guided by feature divergence and CLIP/DINO metrics, for superior image synthesis.
Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and DINO scores, providing continuous semantic and stylistic guidance. By integrating these two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, our method dynamically achieves coherent subject-style synthesis without any retraining. Extensive experiments across diverse subject-style combinations demonstrate that our approach consistently outperforms state-of-the-art LoRA fusion methods both qualitatively and quantitatively.