Search papers, labs, and topics across Lattice.
This study re-evaluates few-step distillation in visual generative models, emphasizing the importance of the training recipe alongside traditional distillation objectives. By analyzing data composition, teacher guidance, and task mixture in Qwen-Image-2.0, the authors uncover critical factors that influence student performance in text-to-image generation and instruction-guided image editing. The findings lead to the development of Qwen-Image-Flash, highlighting that a well-structured training pipeline is essential for effective distillation outcomes.
Rethinking few-step distillation reveals that the training pipeline's organization is as crucial as the distillation objectives themselves.
Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.