Case WesternKyotoNIIOsakaPolyURIKENUTokyoJun 1, 2026arXiv:2606.01914

Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning

Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng, Wang Yang, Sudong Cai, Shuyuan Zheng, Akiko Aizawa, Sadao Kurohashi

AI Summary

This study uncovers a significant failure mode in multimodal large language models (MLLMs) known as spatial lexical bias, where the introduction of a spatial relation word in answer options can skew model decisions towards incorrect choices. By analyzing nine open-weight MLLMs, the authors reveal that while models can correctly answer binary spatial questions, they often falter when a third option is introduced, indicating a reliance on language rather than visual information. Mechanistic interpretability tools pinpoint the bias to specific channels and neurons within the LLMs, leading to a successful mitigation strategy that improves accuracy across various datasets by up to 100 points.

Key Contribution

Adding just one spatial word can lead MLLMs to consistently choose the wrong answer, revealing a critical vulnerability in their reasoning processes.

Abstract

Multimodal large language models (MLLMs) remain unreliable on spatial multiple-choice questions, and their failures are often attributed to poorly attended visual information. In this work, we identify a complementary failure mode, spatial lexical bias: adding a spatial relation word to the answer options can attract the model's decision and make the newly added option likely to be selected. Using nine open-weight MLLMs, we show that this phenomenon is widely observed. In particular, models can answer a binary spatial question correctly, yet consistently select an incorrect third spatial option once it is added to the answer set. We isolate such binary-stable but ternary-fragile cases as diagnostic examples and leverage mechanistic interpretability tools, revealing that a substantial part of the failure instead originates on the language side rather than the visual side: visual attention analyses and residual-stream probes show the correct spatial relation remains internally available on these failures, while irrelevant-option controls, activation patching, and sparse component interventions trace the bias to specific LLM-side channels and neurons. Based on this finding, we show that a lightweight LLM-only DPO update on tiny single-object-pair synthetic data mitigates the bias, lifting four-way robust accuracy by up to 100 points on synthetic data, and by 68.0, 32.6, and 20.1 points on broader evaluation datasets WhatsUp, SpatialMQA-Direct, and VSR.

Interpretability & Mechanistic Interp Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning

Related Papers