Mar 17, 2026arXiv:2603.16445

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Yixin Zhu

AI Summary

This paper introduces Moral Dilemma Simulation (MDS), a new multimodal benchmark grounded in Moral Foundation Theory (MFT), to evaluate moral reasoning in Vision-Language Models (VLMs). The study reveals that visual inputs can override text-based safety mechanisms in VLMs, leading to intuition-driven moral decisions that differ from those made in text-only contexts. Through orthogonal manipulation of visual and contextual variables, the authors demonstrate that the vision modality activates intuition-like pathways, undermining the more deliberate reasoning patterns typically observed in text-only scenarios.

Key Contribution

Visual inputs can hijack the moral compass of VLMs, causing them to abandon carefully tuned text-based safety protocols and make surprisingly unethical decisions.

Abstract

Moral reasoning is fundamental to safe Artificial Intelligence (AI), yet ensuring its consistency across modalities becomes critical as AI systems evolve from text-based assistants to embodied agents. Current safety techniques demonstrate success in textual contexts, but concerns remain about generalization to visual inputs. Existing moral evaluation benchmarks rely on textonly formats and lack systematic control over variables that influence moral decision-making. Here we show that visual inputs fundamentally alter moral decision-making in state-of-the-art (SOTA) Vision-Language Models (VLMs), bypassing text-based safety mechanisms. We introduce Moral Dilemma Simulation (MDS), a multimodal benchmark grounded in Moral Foundation Theory (MFT) that enables mechanistic analysis through orthogonal manipulation of visual and contextual variables. The evaluation reveals that the vision modality activates intuition-like pathways that override the more deliberate and safer reasoning patterns observed in text-only contexts. These findings expose critical fragilities where language-tuned safety filters fail to constrain visual processing, demonstrating the urgent need for multimodal safety alignment.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Related Papers