Search papers, labs, and topics across Lattice.
This paper introduces Parallel Jacobi Decoding (PJD), a novel approach that enhances the efficiency of autoregressive image generation by leveraging the two-dimensional spatial correlations in images. By adjusting the attention mask to facilitate spatially parallel refinement, PJD significantly mitigates error propagation and improves convergence stability. Experimental results demonstrate that PJD achieves a remarkable acceleration of 4.8x-6.4x across various autoregressive models while preserving high-quality image generation.
Achieving up to 6.4x faster autoregressive image generation without sacrificing quality could redefine efficiency benchmarks in the field.
Autoregressive (AR) models have demonstrated remarkable performance in generating high-fidelity images. However, their inherently sequential next-token prediction leads to significantly slower inference. Recent studies have introduced Jacobi-style decoding to accelerate autoregressive image generation. Extending the draft sequence initially improves efficiency, yet the acceleration quickly saturates as error propagation in the one-dimensional sequence hinders convergence. Observing that images exhibit strong local spatial correlations, we propose Parallel Jacobi Decoding (PJD), a training-free decoding approach that expands draft tokens in the two-dimensional spatial domain to enable efficient spatially parallel refinement. PJD adjusts the attention mask to mitigate error accumulation and improve convergence stability. Extensive experiments on diverse datasets show that PJD achieves 4.8x-6.4x acceleration across multiple autoregressive image generation models while maintaining competitive generation quality.