Shanghai AI LabMar 4, 2026arXiv:2603.03792

TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration

Haowei Zhu, Tingxuan Huang, Xing Wang, Tianyu Zhao, Jiexi Wang, Weifeng Chen, Xurui Peng, Fangmin Chen, Junhai Yong, Bin Wang

AI Summary

The paper introduces Token-Adaptive Predictor (TAP), a training-free framework for accelerating diffusion model inference by adaptively selecting a predictor for each token at every sampling step. TAP uses the first layer's output as a probe to compute proxy losses for candidate predictors (Taylor expansions of varying order), assigning each token the predictor with the smallest proxy error. Experiments across various diffusion architectures demonstrate that TAP significantly improves the accuracy-efficiency trade-off compared to fixed global predictors and caching.

Key Contribution

Achieve significant speedups in diffusion model inference, without training, by adaptively selecting the best predictor for each token at each step based on a low-cost probe of the first layer.

Abstract

Diffusion models achieve strong generative performance but remain slow at inference due to the need for repeated full-model denoising passes. We present Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step. TAP uses a single full evaluation of the model's first layer as a low-cost probe to compute proxy losses for a compact family of candidate predictors (instantiated primarily with Taylor expansions of varying order and horizon), then assigns each token the predictor with the smallest proxy error. This per-token "probe-then-select" strategy exploits heterogeneous temporal dynamics, requires no additional training, and is compatible with various predictor designs. TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss. Extensive experiments across multiple diffusion architectures and generation tasks show that TAP substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.

Computer Vision Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration

Related Papers