CohereNJUMar 18, 2026arXiv:2603.17426

SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning

Xi Ye, Wenjia Yang, Yangyang Xu, Xiaoyang Liu, Duo Su, Mengfei Xia, Jun Zhu

AI Summary

The paper introduces Smooth Hybrid Fine-tuning (SHIFT), a reward-driven fine-tuning framework for image-conditioned video diffusion models, designed to improve motion alignment. SHIFT uses pixel-motion rewards based on pixel flux dynamics to capture both instantaneous and long-term motion consistency, addressing the problem of weakened motion fidelity after fine-tuning. By fusing supervised fine-tuning with advantage-weighted fine-tuning using adversarial advantages, SHIFT achieves faster convergence and reduces reward hacking, leading to improved motion dynamics and temporal coherence.

Key Contribution

Image-conditioned video diffusion models can now be fine-tuned to produce more realistic motion dynamics and long-term temporal coherence via a novel reward-driven approach that avoids common pitfalls like reward hacking.

Abstract

Image-conditioned Video diffusion models achieve impressive visual realism but often suffer from weakened motion fidelity, e.g., reduced motion dynamics or degraded long-term temporal coherence, especially after fine-tuning. We study the problem of motion alignment in video diffusion models post-training. To address this, we introduce pixel-motion rewards based on pixel flux dynamics, capturing both instantaneous and long-term motion consistency. We further propose Smooth Hybrid Fine-tuning (SHIFT), a scalable reward-driven fine-tuning framework for video diffusion models. SHIFT fuses the normal supervised fine-tuning and advantage weighted fine-tuning into a unified framework. Benefiting from novel adversarial advantages, SHIFT improves convergence speed and mitigates reward hacking. Experiments show that our approach efficiently resolves dynamic-degree collapse in modern video diffusion models supervised fine-tuning.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning

Related Papers