CASApr 18, 2026arXiv:2604.16902

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Xinru Yan, Boxi Cao, Yao Lu, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

AI Summary

The paper investigates modality preference in Omni-modal Large Language Models (OLLMs), revealing a shift from text-dominance in VLMs to visual preference in OLLMs. They quantify this preference using a new conflict-based benchmark and modality selection rate metric across ten OLLMs. Layer-wise probing shows that modality preference emerges progressively in mid-to-late layers, and these internal signals are used to diagnose cross-modal hallucinations.

Key Contribution

Forget text-dominance: Today's Omni-modal LLMs surprisingly favor visual inputs, creating new challenges for cross-modal reasoning.

Abstract

Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using a newly-curated conflict-based benchmark and the modality selection rate metric. Our evaluation of ten representative OLLMs reveals a notable paradigm shift: unlike the ``text-dominance''of traditional VLMs, most OLLMs exhibit a pronounced visual preference. To further understand the underlying mechanism, we conduct layer-wise probing and demonstrate that such modality preference is not static but emerges progressively in the mid-to-late layers. Building upon these insights, we leverage these internal signals to diagnose cross-modal hallucinations, achieving competitive performance across three downstream multi-modal benchmarks without task-specific data. Our work provides both a mechanistic understanding and a practical tool for building more trustworthy OLLMs. Our code and related resources are publicly available at: https://github.com/icip-cas/OmniPreference

Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Related Papers