Search papers, labs, and topics across Lattice.
The paper introduces SWIFT, a novel approach for attributing generated videos back to their source model without requiring training or modification of the video. SWIFT leverages the temporal structure of videos, specifically the mapping from multiple pixel frames to a single latent frame within video chunks, to perform normal and corrupted reconstructions using a sliding window. By comparing the reconstruction losses, SWIFT generates an attribution signal, achieving high accuracy with few samples and even enabling zero-shot attribution for some models.
Achieve over 90% accuracy in attributing generated videos to their source model with as few as 20 samples, all without training or modifying the videos themselves.
Recent advancements in video generation technologies have been significant, resulting in their widespread application across multiple domains. However, concerns have been mounting over the potential misuse of generated content. Tracing the origin of generated videos has become crucial to mitigate potential misuse and identify responsible parties. Existing video attribution methods require additional operations or the training of source attribution models, which may degrade video quality or necessitate large amounts of training samples. To address these challenges, we define for the first time the "few-shot training-free generated video attribution" task and propose SWIFT, which is tightly integrated with the temporal characteristics of the video. By leveraging the "Pixel Frames(many) to Latent Frame(one)" temporal mapping within each video chunk, SWIFT applies a fixed-length sliding window to perform two distinct reconstructions: normal and corrupted. The variation in the losses between two reconstructions is then used as an attribution signal. We conducted an extensive evaluation of five state-of-the-art (SOTA) video generation models. Experimental results show that SWIFT achieves over 90% average attribution accuracy with merely 20 video samples across all models and even enables zero-shot attribution for HunyuanVideo, EasyAnimate, and Wan2.2. Our source code is available at https://github.com/wangchao0708/SWIFT.