Search papers, labs, and topics across Lattice.
This paper introduces OmniShotCut, a shot boundary detection (SBD) method that formulates SBD as structured relational prediction using a shot query-based dense video Transformer to jointly estimate shot ranges with intra- and inter-shot relations. To address limitations of existing methods, they use a fully synthetic transition synthesis pipeline for precise boundary generation and introduce OmniShotCutBench, a new benchmark for holistic evaluation. Experiments demonstrate improved performance and more interpretable boundaries compared to state-of-the-art SBD methods.
State-of-the-art shot boundary detection gets a major upgrade with a Transformer-based approach that not only improves accuracy but also offers more interpretable boundaries, thanks to a novel relational prediction framework and synthetic training data.
Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.