Central South UniversityFudanHITHKUApr 22, 2026arXiv:2604.20806

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che

AI Summary

This paper introduces OMIBench, a novel benchmark specifically designed to evaluate Olympiad-level reasoning in large vision-language models (LVLMs) by leveraging contextual information across multiple images. The benchmark includes diverse problems from various scientific disciplines and provides annotated rationales and evaluation protocols for assessing model performance. Experimental results reveal significant performance gaps, with top models like Gemini-3-Pro achieving only around 50% accuracy, highlighting the need for improved multi-image reasoning capabilities in LVLMs.

Key Contribution

Even the best large vision-language models struggle with multi-image reasoning, scoring only 50% on a new benchmark designed to challenge their capabilities.

Abstract

Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench, a benchmark designed to evaluate Olympiad-level reasoning when the required evidence is distributed over multiple images. It contains problems from biology, chemistry, mathematics, and physics Olympiads, together with manually annotated rationales and evaluation protocols for both exact and semantic answer matching. Across extensive experiments on OMIBench, we observe meaningful performance gaps in existing models. Even the strongest LVLMs, such as Gemini-3-Pro, attain only about 50% on the benchmark. These results position OMIBench as a focused resources for studying and improving multi-image reasoning in LVLMs.

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

Related Papers