Stanford HAITAUJan 12, 2026arXiv:2601.07833

Tuning-free Visual Effect Transfer across Videos

Maxwell Jones, Rameen Abdal, Or Patashnik, Ruslan Salakhutdinov, Sergey Tulyakov, Jun-Yan Zhu, K. Wang

AI Summary

The paper introduces RefVFX, a framework for transferring complex temporal visual effects from a reference video to a target video or image in a feed-forward manner. To train the model, the authors created a large-scale dataset of video triplets using a novel automated pipeline that preserves input motion while applying repeatable effects, augmented with LoRA-derived and programmatically generated data. Experiments demonstrate that RefVFX generalizes to unseen effects, produces temporally coherent edits, and outperforms text-prompt baselines.

Key Contribution

Forget tedious prompt engineering – RefVFX lets you copy and paste visual effects between videos with a single reference clip.

Abstract

We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at prompt-based or keyframe-conditioned editing, they struggle with dynamic temporal effects such as dynamic lighting changes or character transformations, which are difficult to describe via text or static conditions. Transferring a video effect is challenging, as the model must integrate the new temporal dynamics with the input video's existing motion and appearance. % To address this, we introduce a large-scale dataset of triplets, where each triplet consists of a reference effect video, an input image or video, and a corresponding output video depicting the transferred effect. Creating this data is non-trivial, especially the video-to-video effect triplets, which do not exist naturally. To generate these, we propose a scalable automated pipeline that creates high-quality paired videos designed to preserve the input's motion and structure while transforming it based on some fixed, repeatable effect. We then augment this data with image-to-video effects derived from LoRA adapters and code-based temporal effects generated through programmatic composition. Building on our new dataset, we train our reference-conditioned model using recent text-to-video backbones. Experimental results demonstrate that RefVFX produces visually consistent and temporally coherent edits, generalizes across unseen effect categories, and outperforms prompt-only baselines in both quantitative metrics and human preference. See our website at https://tuningfreevisualeffects-maker.github.io/Tuning-free-Visual-Effect-Transfer-across-Videos-Project-Page/

Citation Metrics

Citations0

Influential citations0

References88

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Tuning-free Visual Effect Transfer across Videos

Related Papers