Shenzhen UniversityMar 4, 2026arXiv:2603.03944

Spatial Causal Prediction in Video

Yanguang Zhao, Jie Yang, Shutong Hu, Hongbo Qiu, Guijia Zhang, Tan Kai Ze, Chia-Wen Lin, Mong-Li Lee

AI Summary

This paper introduces Spatial Causal Prediction (SCP), a novel task paradigm designed to evaluate a model's ability to infer unseen past or future spatial states in videos, going beyond visible spatio-temporal understanding. To facilitate this, the authors created SCP-Bench, a benchmark dataset with 2,500 QA pairs across 1,181 videos. Experiments on 23 state-of-the-art models using SCP-Bench revealed significant performance gaps compared to humans, highlighting limitations in temporal extrapolation and causal grounding.

Key Contribution

Current video models struggle to infer unseen spatial states and causal relationships, falling far short of human-level spatial reasoning.

Abstract

Spatial reasoning, the ability to understand spatial relations, causality, and dynamic evolution, is central to human intelligence and essential for real-world applications such as autonomous driving and robotics. Existing studies, however, primarily assess models on visible spatio-temporal understanding, overlooking their ability to infer unseen past or future spatial states. In this work, we introduce Spatial Causal Prediction (SCP), a new task paradigm that challenges models to reason beyond observation and predict spatial causal outcomes. We further construct SCP-Bench, a benchmark comprising 2,500 QA pairs across 1,181 videos spanning diverse viewpoints, scenes, and causal directions, to support systematic evaluation. Through comprehensive experiments on {23} state-of-the-art models, we reveal substantial gaps between human and model performance, limited temporal extrapolation, and weak causal grounding. We further analyze key factors influencing performance and propose perception-enhancement and reasoning-guided strategies toward advancing spatial causal intelligence. The project page is https://guangstrip.github.io/SCP-Bench.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Spatial Causal Prediction in Video

Related Papers