CUHKApr 6, 2026arXiv:2604.04500

Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward

Shizhan Gong, Minda Hu, Qiyuan Zhang, Qi Dou

AI Summary

This paper introduces Saliency-R1, a framework that enhances the interpretability and faithfulness of vision-language models by aligning model-generated saliency maps with human-annotated bounding boxes during training. They use a novel saliency map technique to highlight critical image regions contributing to generated tokens without additional computational overhead. By using the overlap between saliency maps and human annotations as a reward function within a Group Relative Policy Optimization (GRPO) framework, Saliency-R1 encourages the model to focus on relevant visual areas during reasoning, improving faithfulness, interpretability, and overall task performance.

Key Contribution

Force your VLMs to *show their work*: Saliency-R1 aligns model attention with human-annotated visual cues, boosting faithfulness and interpretability without extra compute.

Abstract

Vision-language models (VLMs) have achieved remarkable success across diverse tasks. However, concerns about their trustworthiness persist, particularly regarding tendencies to lean more on textual cues than visual evidence and the risk of producing ungrounded or fabricated responses. To address these issues, we propose Saliency-R1, a framework for improving the interpretability and faithfulness of VLMs reasoning. Specifically, we introduce a novel saliency map technique that efficiently highlights critical image regions contributing to generated tokens without additional computational overhead. This can further be extended to trace how visual information flows through the reasoning process to the final answers, revealing the alignment between the thinking process and the visual context. We use the overlap between the saliency maps and human-annotated bounding boxes as the reward function, and apply Group Relative Policy Optimization (GRPO) to align the salient parts and critical regions, encouraging models to focus on relevant areas when conduct reasoning. Experiments show Saliency-R1 improves reasoning faithfulness, interpretability, and overall task performance.

Interpretability & Mechanistic Interp Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward

Related Papers