Mar 17, 2026arXiv:2603.16253

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

AI Summary

The paper introduces Explicit Visual Premise Verification (EVPV), a method to improve the reliability of Vision-Language Process Reward Models (VL-PRMs) by explicitly verifying the visual premises of each reasoning step. EVPV prompts the policy to generate a visual checklist of required facts and matches these against structured visual constraints extracted from the image, producing a reliability score. By using this score to calibrate PRM step rewards, EVPV decouples perceptual uncertainty from logical evaluation, leading to improved step-level verification and reranking accuracy on multiple benchmarks.

Key Contribution

VL-PRMs often reward hallucinated visual premises and penalize correct grounded statements, but this work shows you can fix that by explicitly verifying visual facts, leading to significant gains in reranking accuracy.

Abstract

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box judges: a low step score may reflect a genuine reasoning mistake or simply the verifier's misperception of the image. This entanglement between perception and reasoning leads to systematic false positives (rewarding hallucinated visual premises) and false negatives (penalizing correct grounded statements), undermining both reranking and error localization. We introduce Explicit Visual Premise Verification (EVPV), a lightweight verification interface that conditions step scoring on the reliability of the visual premises a step depends on. The policy is prompted to produce a step-wise visual checklist that makes required visual facts explicit, while a constraint extractor independently derives structured visual constraints from the input image. EVPV matches checklist claims against these constraints to compute a scalar visual reliability signal, and calibrates PRM step rewards via reliability gating: rewards for visually dependent steps are attenuated when reliability is low and preserved when reliability is high. This decouples perceptual uncertainty from logical evaluation without per-step tool calls. Experiments on VisualProcessBench and six multimodal reasoning benchmarks show that EVPV improves step-level verification and consistently boosts Best-of-N reranking accuracy over strong baselines. Furthermore, injecting controlled corruption into the extracted constraints produces monotonic performance degradation, providing causal evidence that the gains arise from constraint fidelity and explicit premise verification rather than incidental prompt effects. Code is available at: https://github.com/Qwen-Applications/EVPV-PRM

Multimodal Models Reasoning & Chain-of-Thought RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Related Papers