NUSMar 6, 2026arXiv:2603.05811

Training-free Latent Inter-Frame Pruning with Attention Recovery

Dennis Menn, Dennis Menn, Yuedong Yang, Yuedong Yang, Bokun Wang, Bokun Wang, Xiwen Wei, Xiwen Wei, Mustafa Munir, Mustafa Munir, Feng Liang, Feng Liang, R. Marculescu, Radu Marculescu, Chenfeng Xu, D. Marculescu, Diana Marculescu

AI Summary

The paper introduces Latent Inter-frame Pruning with Attention Recovery (LIPAR), a training-free framework to reduce the computational cost of video generation by identifying and skipping redundant latent patches across frames. LIPAR incorporates an attention recovery mechanism to mitigate visual artifacts caused by pruning, approximating the attention values of pruned tokens. Experiments show LIPAR improves video editing throughput by 1.45x, achieving 12.2 FPS on an NVIDIA A6000 without compromising generation quality.

Key Contribution

Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.

Abstract

Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by $1.45\times$, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Training-free Latent Inter-Frame Pruning with Attention Recovery

Related Papers