Search papers, labs, and topics across Lattice.
The paper introduces SGC, a new metric for evaluating 3D spatial geometric consistency in dynamically generated videos by measuring the divergence of locally estimated camera poses. SGC addresses the limitations of existing metrics like FVD that are insensitive to geometric distortions and consistency-focused benchmarks that penalize valid foreground dynamics. Experiments show SGC effectively identifies geometric inconsistencies in generative videos, outperforming existing metrics.
Generative videos might look great, but a new metric reveals they often suffer from jarring 3D spatial inconsistencies that existing metrics miss.
Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.