RadboudXJTUApr 30, 2026arXiv:2604.27553

Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models

Xiaomeng Wang, Martha Larson, Zhengyu Zhao

AI Summary

This paper investigates the impact of visual text style (functional vs. decorative) on attribute-based descriptions generated by Large Visual Language Models (LVLMs). They find that even when LVLMs correctly identify the concept represented by the text, the visual style of the text significantly influences the attributes included in the model's description of that concept. This reveals a non-trivial style leakage from visual text style into semantic inference within LVLMs.

Key Contribution

LVLMs leak visual text style into semantic inference, meaning the font of a word can change the attributes a model associates with the concept it represents.

Abstract

When the visual style of text is considered, a wide variety can be observed in font, color, and size. However, when a word is read, its meaning is independent of the style in which it has been written or rendered. In this paper, we investigate whether, and how, the style in which a word is visualized in an image impacts the description that a Large Visual Language Model (LVLM) provides for the concept to which that word refers. Specifically, we investigate how functional text styles (readability-oriented, e.g., black sans-serif) versus decorative styles (display-oriented, e.g., colored cursive/script) affect LVLMs'descriptions of a concept in terms of the attributes of that concept. Our experiments study the situation in which the LVLM is able to correctly identify the concept referred to by a visual text, i.e., by a word or words rendered as an image, and in which the visual text style should not influence the attribute-based description that the LVLM produces. Our experimental results reveal that even when the concept is correctly identified, text style influences the model's attribute-based descriptions of the concept. Our findings demonstrate non-trivial style leakage from text style into semantic inference and motivate style-aware evaluation and mitigation for LVLM-based multimedia systems.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models

Related Papers