AI2Mar 12, 2026arXiv:2603.12249

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

Ziyu Chen, Yilun Zhao, Chengye Wang, Rilyn Han, Manasi S. Patwardhan, Manasi Patwardhan, Arman Cohan

AI Summary

The authors introduce a "synthesize-and-reground" framework to generate SciMDR, a large-scale dataset of 300K QA pairs derived from 20K scientific papers, designed for training and evaluating cross-modal comprehension in scientific documents. This framework balances scale, faithfulness, and realism by first synthesizing claim-centric QA pairs and then re-embedding them into full-document contexts. Fine-tuning models on SciMDR leads to significant performance gains on scientific QA benchmarks, especially those demanding complex document-level reasoning.

Key Contribution

Training on SciMDR, a new 300K QA dataset synthesized from scientific papers, substantially boosts model performance on complex, document-level scientific reasoning tasks.

Abstract

Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-reground framework, a two-stage pipeline comprising: (1) Claim-Centric QA Synthesis, which generates faithful, isolated QA pairs and reasoning on focused segments, and (2) Document-Scale Regrounding, which programmatically re-embeds these pairs into full-document tasks to ensure realistic complexity. Using this framework, we construct SciMDR, a large-scale training dataset for cross-modal comprehension, comprising 300K QA pairs with explicit reasoning chains across 20K scientific papers. We further construct SciMDR-Eval, an expert-annotated benchmark to evaluate multimodal comprehension within full-length scientific workflows. Experiments demonstrate that models fine-tuned on SciMDR achieve significant improvements across multiple scientific QA benchmarks, particularly in those tasks requiring complex document-level reasoning.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References53

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

Related Papers