Mar 16, 2026arXiv:2603.15026

Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Omer Ben Hayun, Roy Betser, Meir Yossef Levi, Levi Kassel, Guy Gilboa

AI Summary

The paper introduces STALL, a training-free video forgery detection method that scores videos based on the likelihood of spatial and temporal features observed in real videos. STALL models spatial and temporal evidence jointly within a probabilistic framework, avoiding the limitations of frame-based image detectors and the poor generalization of supervised video detectors. Experiments on public benchmarks and a new benchmark (ComGenVid) demonstrate that STALL outperforms existing image- and video-based baselines in detecting generated videos.

Key Contribution

Forget training data: a new training-free method, STALL, leverages spatial-temporal likelihoods to detect AI-generated videos with state-of-the-art accuracy.

Abstract

Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce \emph{STALL}, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.

Computer Vision Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Related Papers