Mar 26, 2026arXiv:2603.25823

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Haonan Han, Jiancheng Huang, Xiaopeng Sun, Junyan He, Rui Yang, Jie Hu, Xiaojiang Peng, Lin Ma, Xiaoming Wei, Xiu Li

AI Summary

The paper introduces ViGoR-Bench, a new benchmark to evaluate the reasoning capabilities of visual generative models across image and video tasks. ViGoR uses a dual-track evaluation, assessing both intermediate generative steps and final outputs, along with an evidence-grounded automated judge for high human alignment. Experiments on 20+ models reveal significant reasoning deficits in even state-of-the-art systems, highlighting the gap between visual fidelity and actual reasoning ability.

Key Contribution

Despite impressive visual fidelity, today's generative models still stumble on basic physical, causal, and spatial reasoning tasks, revealing a "logical desert" beneath the surface.

Abstract

Beneath the stunning visual fidelity of modern AIGC models lies a"logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage''that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through four key innovations: 1) holistic cross-modal coverage bridging Image-to-Image and Video tasks; 2) a dual-track mechanism evaluating both intermediate processes and final results; 3) an evidence-grounded automated judge ensuring high human alignment; and 4) granular diagnostic analysis that decomposes performance into fine-grained cognitive dimensions. Experiments on over 20 leading models reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR as a critical ``stress test''for the next generation of intelligent vision models. The demo have been available at https://vincenthancoder.github.io/ViGoR-Bench/

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Related Papers