Mar 5, 2026arXiv:2603.05315

Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers

AI Summary

The paper introduces SpectralCache, a novel caching framework designed to accelerate Diffusion Transformers (DiTs) by exploiting non-uniform sensitivities across time, depth, and feature dimensions during the denoising process. SpectralCache incorporates Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC) to intelligently manage caching decisions. Experiments on FLUX.1-schnell demonstrate a 2.46x speedup with competitive image quality, surpassing the performance of TeaCache by 16%.

Key Contribution

Achieve 16% faster DiT inference by caching the right features at the right denoising steps, without sacrificing image quality.

Abstract

Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation, yet their iterative denoising process incurs substantial computational cost during inference. Existing caching methods accelerate DiTs by reusing intermediate computations across timesteps, but they share a common limitation: treating the denoising process as uniform across time,depth, and feature dimensions. In this work, we identify three orthogonal axes of non-uniformity in DiT denoising: (1) temporal -- sensitivity to caching errors varies dramatically across the denoising trajectory; (2) depth -- consecutive caching decisions lead to cascading approximation errors; and (3) feature -- different components of the hidden state exhibit heterogeneous temporal dynamics. Based on these observations, we propose SpectralCache, a unified caching framework comprising Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC). On FLUX.1-schnell at 512x512 resolution, SpectralCache achieves 2.46x speedup with LPIPS 0.217 and SSIM 0.727, outperforming TeaCache (2.12x, LPIPS 0.215, SSIM 0.734) by 16% in speed while maintaining comparable quality (LPIPS difference<1%). Our approach is training-free, plug-and-play, and compatible with existing DiT architectures.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers

Related Papers