DAMOMay 26, 2026arXiv:2605.26632

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Xing Cong, Hanlin Tang, Kan Liu, Lan Tao, Lin Qu, Chenhao Xie

AI Summary

RT-Lynx introduces a novel approach to accelerate Diffusion Transformer (DiT) inference by applying N:M semi-structured sparsity to activations instead of weights, capitalizing on the intrinsic sparsity and robustness of DiT activations. To mitigate accuracy loss from activation sparsification, error-compensation techniques are incorporated. Optimized CUDA kernels tailored for N:M activation sparsity achieve up to 1.55x speedup in linear layers, preserving generation quality across multiple diffusion models.

Key Contribution

DiT activations are far more amenable to semi-structured sparsity than weights, unlocking significant inference speedups without sacrificing generation quality.

Abstract

Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intrinsically sparse and significantly more robust to N:M semi-structured sparsification than weights. Motivated by this observation, we advocate a paradigm shift from weight sparsification to activation sparsification. We propose RT-Lynx, which applies N:M sparsification to activations and incorporates error-compensation techniques to mitigate accuracy loss. We further implement highly optimized CUDA kernels tailored to this setting, achieving up to a 1.55x speedup on average in linear layers. Extensive experiments across multiple diffusion models demonstrate that our method preserves the generation quality of the original models while substantially accelerating inference.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Related Papers