Search papers, labs, and topics across Lattice.
The paper introduces Relational Feature Caching (RFC) to accelerate Diffusion Transformers (DiTs) by leveraging the relationship between input and output features of computationally expensive modules. They identify that prediction errors in existing forecasting-based caching methods stem from irregular changes in output feature magnitudes and propose relational feature estimation (RFE) to predict these changes from the inputs. By further introducing relational cache scheduling (RCS) to selectively perform full computations based on input-estimated prediction errors, RFC achieves significant performance improvements over previous caching techniques.
By relating input features to output feature changes, RFC substantially boosts the accuracy of feature caching in diffusion transformers, leading to faster and more efficient image generation.
Feature caching approaches accelerate diffusion transformers (DiTs) by storing the output features of computationally expensive modules at certain timesteps, and exploiting them for subsequent steps to reduce redundant computations. Recent forecasting-based caching approaches employ temporal extrapolation techniques to approximate the output features with cached ones. Although effective, relying exclusively on temporal extrapolation still suffers from significant prediction errors, leading to performance degradation. Through a detailed analysis, we find that 1) these errors stem from the irregular magnitude of changes in the output features, and 2) an input feature of a module is strongly correlated with the corresponding output. Based on this, we propose relational feature caching (RFC), a novel framework that leverages the input-output relationship to enhance the accuracy of the feature prediction. Specifically, we introduce relational feature estimation (RFE) to estimate the magnitude of changes in the output features from the inputs, enabling more accurate feature predictions. We also present relational cache scheduling (RCS), which estimates the prediction errors using the input features and performs full computations only when the errors are expected to be substantial. Extensive experiments across various DiT models demonstrate that RFC consistently outperforms prior approaches significantly. Project page is available at https://cvlab.yonsei.ac.kr/projects/RFC