KAISTFeb 25, 2026arXiv:2602.21760

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Euisoo Jung, Euisoo Jung, Byunghyun Kim, Byunghyun Kim, Hyunjin Kim, Hyunjin Kim, Seong-Rae Cho, Seonghye Cho, Jae-Gil Lee, Jae-Gil Lee

AI Summary

The paper introduces a hybrid data-pipeline parallelism framework for accelerating conditional diffusion model inference by exploiting the conditional and unconditional denoising paths for data partitioning. They propose an adaptive parallelism switching method that dynamically adjusts pipeline parallelism based on the denoising discrepancy between the conditional and unconditional paths. The framework achieves 2.31x and 2.07x latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX 3090 GPUs, demonstrating its effectiveness across U-Net and DiT architectures while maintaining image quality.

Key Contribution

Double your diffusion model inference speed on SDXL and SD3 with a clever hybrid parallelism approach that exploits conditional guidance.

Abstract

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves $2.31\times$ and $2.07\times$ latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.

Computer Vision Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Related Papers