Ant GroupMeituanNJUZJUJun 8, 2026arXiv:2606.09195

Symbolic and Abstractive Reasoning with Complex Visual Queries

Yichi Zhang, Jingdian Lu, Zhuo Chen, Lingbing Guo, Jun Xu, Wen Zhang, Huajun Chen

AI Summary

This paper introduces the complex visual query (CVQ) as a novel data type to enhance symbolic and abstractive reasoning in multi-modal large language models (MLLMs). By synthesizing a diverse dataset of 14 query types through first-order logic operators and employing a two-stage training framework, the authors rigorously evaluate MLLMs' reasoning capabilities across various tasks and scenarios. The findings reveal significant improvements in visual reasoning performance, underscoring the potential of CVQs to bridge gaps in human-like neuro-symbolic reasoning for MLLMs.

Key Contribution

Complex visual queries can significantly elevate the reasoning capabilities of multi-modal large language models, revealing new dimensions in AI's understanding of abstract visual content.

Abstract

Understanding and reasoning over abstract visual content remains a challenge for current multi-modal large language models (MLLMs). In this paper, we explore a novel abstract data type termed complex visual query (CVQ), designed to probe symbolic and abstractive reasoning, which is a critical yet underexplored dimension of human-like neuro-symbolic reasoning for MLLMs. We present a comprehensive investigation from three perspectives: \textbf{Data $\times$ Paradigm $\times$ Exploration}. Specifically, we propose a scalable pipeline for synthesizing CVQs grounded in large-scale multi-modal knowledge graphs, generating a diverse dataset encompassing 14 distinct query types via systematic combinations of first-order logic operators. We further introduce a two-stage training framework that progressively equips MLLMs with robust visual reasoning capabilities. We conduct extensive experiments to rigorously evaluate MLLMs across multiple dimensions, including reasoning performance on CVQs, as well as cross-task and cross-scenario generalization. We believe our work opens new perspectives and avenues for advancing the reasoning frontiers of MLLMs.

Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Symbolic and Abstractive Reasoning with Complex Visual Queries

Related Papers