Mar 2, 2026arXiv:2603.01685

FastLightGen: Fast and Light Video Generation with Fewer Steps and Parameters

AI Summary

The paper introduces FastLightGen, a novel algorithm for distilling large video generation models into smaller, faster versions by simultaneously reducing sampling steps and model parameters. They address the challenge of high computational cost in state-of-the-art video generation models like Hunyuan and WanX. By constructing an optimal teacher model and using it in a synergistic distillation framework, FastLightGen achieves state-of-the-art performance in efficient video generation, demonstrated on HunyuanVideo-ATI2V and WanX-TI2V with a 4-step sampling and 30% parameter pruning.

Key Contribution

Squeezing large video generation models yields surprisingly efficient results: a 4-step sampling model with 30% fewer parameters can outperform the original.

Abstract

The recent advent of powerful video generation models, such as Hunyuan, WanX, Veo3, and Kling, has inaugurated a new era in the field. However, the practical deployment of these models is severely impeded by their substantial computational overhead, which stems from enormous parameter counts and the iterative, multi-step sampling process required during inference. Prior research on accelerating generative models has predominantly followed two distinct trajectories: reducing the number of sampling steps (e.g., LCM, DMD, and MagicDistillation) or compressing the model size for more efficient inference (e.g., ICMD). The potential of simultaneously compressing both to create a fast and lightweight model remains an unexplored avenue. In this paper, we propose FastLightGen, an algorithm that transforms large, computationally expensive models into fast, lightweight counterparts. The core idea is to construct an optimal teacher model, one engineered to maximize student performance, within a synergistic framework for distilling both model size and inference steps. Our extensive experiments on HunyuanVideo-ATI2V and WanX-TI2V reveal that a generator using 4-step sampling and 30\% parameter pruning achieves optimal visual quality under a constrained inference budget. Furthermore, FastLightGen consistently outperforms all competing methods, establishing a new state-of-the-art in efficient video generation.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FastLightGen: Fast and Light Video Generation with Fewer Steps and Parameters

Related Papers