Search papers, labs, and topics across Lattice.
The paper introduces ADAPT, a training-free framework for generating rare compositional concepts in text-to-image synthesis using diffusion models. ADAPT addresses the limitations of LLM-based prompt scheduling by deterministically planning and semantically aligning prompt schedules based on attention scores and orthogonal components. Experiments on the RareBench benchmark demonstrate that ADAPT significantly improves the compositional generation of rare concepts while maintaining visual integrity.
Ditch the finetuning: this training-free method uses attention scores to generate rare concepts in images with more precision and control than LLM-guided approaches.
Generating rare compositional concepts in text-to-image synthesis remains a challenge for diffusion models, particularly for attributes that are uncommon in the training data. While recent approaches, such as R2F, address this challenge by utilizing LLM for prompt scheduling, they suffer from inherent variance due to the randomness of language models and suboptimal guidance from iterative text embedding switching. To address these problems, we propose the ADAPT framework, a training-free framework that deterministically plans and semantically aligns prompt schedules, providing consistent guidance to enhance the composition of rare concepts. By leveraging attention scores and orthogonal components, ADAPT significantly enhances compositional generation of rare concepts in the RareBench benchmark without additional training or fine-tuning. Through comprehensive experiments, we demonstrate that ADAPT achieves superior performance in RareBench and accurately reflects the semantic information of rare attributes, providing deterministic and precise control over the generation of rare compositions without compromising visual integrity.