Korea UMyongji UniversityMar 30, 2026arXiv:2603.28026

When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA

Taeyun Roh, Eun-yeong Jo, Wonjune Jang, Jaewoo Kang

AI Summary

The paper identifies a bias in scientific figure multiple-choice question answering (MCQA) where answer choices act as priors, leading models to select scientifically plausible options even when unsupported by the figure. To mitigate this, they introduce SCICON, a training-free decoding method that contrasts image-conditioned option scores with text-only option scores. SCICON consistently improves accuracy across three scientific figure QA benchmarks and three model backbones, demonstrating the effectiveness of discounting choice-induced priors.

Key Contribution

Scientific figure QA models are often fooled by the answer choices themselves, but a simple decoding strategy that contrasts image-grounded scores with text-only scores can significantly improve accuracy.

Abstract

Scientific figure multiple-choice question answering (MCQA) requires models to reason over diverse visual evidence, ranging from charts and multipanel figures to microscopy and biomedical images. However, this setting suffers from a distinctive bias: answer choices themselves can act as priors, steering multimodal models toward scientifically plausible options even when the figure supports a different answer. We investigate this failure mode through a simple question: what if decoding explicitly discounts what the model would prefer from text alone, so as to favor figure-grounded evidence? To this end, we propose SCICON, a training-free decoding method that scores each candidate by subtracting a text-only option score from its image-conditioned counterpart. Unlike prior contrastive decoding approaches that mitigate hallucinations by contrasting original inputs with distorted images or perturbed instructions, SCICON directly targets the choice-induced prior encoded in candidate text. Across three scientific figure QA benchmarks and three model backbones, SCICON consistently improves accuracy over standard decoding baselines. These results show that decoding against choice-induced priors is an effective and simple way to improve figure-grounded reasoning in scientific MCQA.

Eval Frameworks & Benchmarks Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA

Related Papers