Search papers, labs, and topics across Lattice.
This paper introduces PathRelax, a novel framework that enhances the efficiency of autoregressive text-to-image generation by employing a multi-sequence draft tree structure for decoding. By utilizing parallel-path speculative Jacobi decoding and cross-path relaxed verification, the method significantly expands the token search space, achieving speedup ratios of 4.14x, 3.95x, and 4.18x on various datasets without compromising image quality. The results indicate that PathRelax not only accelerates the generation process but also integrates well with existing relaxation techniques for real-time applications.
Achieving over 4x speedup in text-to-image generation without sacrificing quality could revolutionize real-time creative applications.
The growing need for high-resolution image generation in autoregressive text-to-image models has resulted in extended token sequences, significantly increasing computational costs and inference times. However, existing state-of-the-art methods for accelerating autoregressive text-to-image models rely on chain-structured draft token sequences, leading to inefficient draft token search and limited acceptance lengths. To address this, we propose parallel-path cross-relaxed speculative Jacobi decoding (\textbf{PathSpec}), a novel framework that enhances efficiency through a multi-sequence draft tree structure. Our parallel-path speculative Jacobi decoding (\textbf{PathExplore}) expands the token search space, achieving a higher speedup ratio without sacrificing image quality. Additionally, we introduce cross-path relaxed verification (\textbf{PathRelax}) that exploits semantic similarities across sequences to further boost token acceptance rates. Evaluated on the Parti-Prompts, MSCOCO2017, and T2ICompBench datasets, our method achieves a speedup ratio of 4.14 $\times$, 3.95$\times$, and 4.18$\times$, respectively. Remarkably, PathExplore, without any relaxed sampling, outperforms relaxed sampling methods in the speedup ratio, such as GSD and LANTERN. Moreover, PathRelax's relaxation mechanism can be seamlessly integrated with other relaxation techniques, enabling further acceleration and providing an efficient solution for real-time text-to-image generation. Our code is available at https://github.com/Haodong-Lei-Ray/PathSpec.