FLock.ioFLock.io LondonHKUSTOxfordMar 16, 2026arXiv:2603.15997

Visual Set Program Synthesizer

Zehua Cheng, Wei Dai, Wenhu Zhang, Thomas Lukasiewicz, Jiahao Sun

AI Summary

The paper introduces a visual program synthesis approach for visual reasoning tasks, where a model generates a symbolic program executed by a separate engine grounded in visual scenes. This approach addresses the limitations of end-to-end MLLMs in handling set-based reasoning tasks like filtering, comparison, and aggregation. Experiments on a new benchmark, Set-VQA, demonstrate that the proposed method significantly outperforms state-of-the-art baselines in complex reasoning, improving accuracy and providing more transparent behavior.

Key Contribution

End-to-end MLLMs struggle with visual reasoning, but a program synthesis approach that explicitly represents compositional logic dramatically improves accuracy and transparency.

Abstract

A user pointing their phone at a supermarket shelf and asking"Which soda has the least sugar?"poses a difficult challenge for current visual Al assistants. Such queries require not only object recognition, but explicit set-based reasoning such as filtering, comparison, and aggregation. Standard endto-end MLLMs often fail at these tasks because they lack an explicit mechanism for compositional logic. We propose treating visual reasoning as Visual Program Synthesis, where the model first generates a symbolic program that is executed by a separate engine grounded in visual scenes. We also introduce Set-VQA, a new benchmark designed specifically for evaluating set-based visual reasoning. Experiments show that our approach significantly outperforms state-of-the-art baselines on complex reasoning tasks, producing more systematic and transparent behavior while substantially improving answer accuracy. These results demonstrate that program-driven reasoning provides a principled alternative to black-box visual-language inference.

Code Generation & Program Synthesis Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Visual Set Program Synthesizer

Related Papers