HKUSTTencent AIMay 28, 2026arXiv:2605.30116

SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation

Zhuguanyu Wu, Ruihao Gong, Yang Yong, Yushi Huang, Xiangyu Fan, Dahua Lin, Xianglong Liu

AI Summary

This paper introduces Score Gradient Matching Distillation (SGMD), a novel approach to distilling video diffusion models into few-step generators. SGMD optimizes the fake score directly towards the teacher using a teacher stop-gradient Fisher divergence, improving training stability and speed. Experiments show SGMD achieves a 3x training speedup over DMD2 and significantly enhances motion dynamics in 4-step distilled models, as validated by human preference studies.

Key Contribution

Distilling video diffusion models just got a whole lot faster and better at capturing motion, thanks to a new method that directly optimizes score gradients.

Abstract

Distribution Matching Distillation (DMD) is a widely used paradigm for accelerating inference in few-step video diffusion models. However, DMD-style video distillation faces two coupled challenges: the fake score must track a continuously evolving generator, making training costly when frequent updates are required, while reverse-KL-style matching can be mode-seeking and conservative for preserving strong motion dynamics. To address these issues, we propose \textbf{Score Gradient Matching Distillation (SGMD)}. SGMD adopts a fake-score perspective by directly optimizing the fake score toward the teacher, while using teacher stop-gradient Fisher as a stable distribution-matching objective. We provide a gradient analysis that motivates this objective choice under ideal tracking. Building on this, SGMD introduces a pair of dual potentials: negative-residual (NR) for outer-loop correction and residual-contraction (RC) for inner-loop tracking. Empirically, compared to DMD2, SGMD achieves an approximately $\sim 3\times$ training speedup and substantially improves motion dynamics for 4-step distilled models while preserving temporal consistency. A human study confirms that SGMD is preferred in motion quality and overall preference, while visual quality and text alignment remain comparable. Code is available at https://github.com/ModelTC/LightX2V.

Computer Vision Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation

Related Papers