Search papers, labs, and topics across Lattice.
LegoDiffusion decomposes text-to-image diffusion workflows into independently managed and scheduled model-execution nodes, enabling fine-grained resource management. This micro-serving approach unlocks cluster-scale optimizations like per-model scaling, model sharing, and adaptive model parallelism, addressing the limitations of monolithic workflow serving systems. Experiments show LegoDiffusion achieves up to 3x higher request rates and 8x higher burst tolerance compared to existing systems.
Stop treating diffusion workflows as monolithic black boxes: LegoDiffusion unlocks 3x higher throughput by decomposing them into independently scalable microservices.
Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic.