AI LaboratoryBeihangChongqingD scene information. FirstD visual recognition andM QA pairs over more thanNorthwesternSJTUUTokyoMay 21, 2026arXiv:2605.22536

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Xiaolong Zhou, Yifei Liu, Ziyang Gong, Qiyue Zhao, Muyao Niu, Le Ma, Xue Yang, Hongjie Zhang, Zhihang Zhong

AI Summary

The paper introduces SpaceDG, a large-scale dataset and benchmark for evaluating the robustness of Multimodal Large Language Models (MLLMs) in spatial reasoning under various visual degradations. SpaceDG uses a physically grounded degradation synthesis engine within a 3D Gaussian Splatting framework to realistically simulate nine degradation types, generating approximately 1M QA pairs. Experiments on 25 MLLMs demonstrate a significant performance drop in spatial reasoning under degraded conditions, which can be mitigated through finetuning on SpaceDG, even surpassing human performance in some cases.

Key Contribution

Visual degradations can cripple the spatial reasoning abilities of even state-of-the-art MLLMs, but targeted finetuning can restore—and even surpass—human-level performance.

Abstract

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Related Papers