Mar 5, 2026arXiv:2603.05095

GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

Xiaodong Zhu, Yuanming Zheng, Suting Wang, Junqi Yang, Yuhong Yang, Weiping Tu, Zhongyuan Wang

AI Summary

The paper introduces GEM-TFL, a novel weakly supervised temporal forgery localization (WS-TFL) framework that bridges the gap between weak and full supervision by reformulating binary labels into multi-dimensional latent attributes via EM-based optimization. GEM-TFL further incorporates a training-free temporal consistency refinement to smooth frame-level predictions and a graph-based proposal refinement module to model temporal-semantic relationships. Experiments on benchmark datasets demonstrate that GEM-TFL achieves state-of-the-art performance in WS-TFL, significantly closing the performance gap with fully supervised methods.

Key Contribution

By reformulating binary labels into multi-dimensional latent attributes via EM, GEM-TFL significantly narrows the performance gap between weakly and fully supervised temporal forgery localization.

Abstract

Temporal Forgery Localization (TFL) aims to precisely identify manipulated segments within videos or audio streams, providing interpretable evidence for multimedia forensics and security. While most existing TFL methods rely on dense frame-level labels in a fully supervised manner, Weakly Supervised TFL (WS-TFL) reduces labeling cost by learning only from binary video-level labels. However, current WS-TFL approaches suffer from mismatched training and inference objectives, limited supervision from binary labels, gradient blockage caused by non-differentiable top-k aggregation, and the absence of explicit modeling of inter-proposal relationships. To address these issues, we propose GEM-TFL (Graph-based EM-powered Temporal Forgery Localization), a two-phase classification-regression framework that effectively bridges the supervision gap between training and inference. Built upon this foundation, (1) we enhance weak supervision by reformulating binary labels into multi-dimensional latent attributes through an EM-based optimization process; (2) we introduce a training-free temporal consistency refinement that realigns frame-level predictions for smoother temporal dynamics; and (3) we design a graph-based proposal refinement module that models temporal-semantic relationships among proposals for globally consistent confidence estimation. Extensive experiments on benchmark datasets demonstrate that GEM-TFL achieves more accurate and robust temporal forgery localization, substantially narrowing the gap with fully supervised methods.

Computer Vision Speech & Audio

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

Related Papers