CASNorthwesternMar 18, 2026arXiv:2603.17508

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Chi Zhang, Xiang Feng, Qiming Zhang, Haibo Qiu, Lihuo He, Dengpan Ye, Xinbo Gao, Jing Zhang

AI Summary

The paper introduces Omni-I2C, a new benchmark for evaluating Large Multimodal Models (LMMs) in generating executable code from complex, structured digital graphics. The benchmark includes 1080 diverse samples spanning various subjects, image modalities, and programming languages, emphasizing the need for high-fidelity visual perception and precise code generation. Evaluations using Omni-I2C reveal significant performance gaps in state-of-the-art LMMs, particularly in preserving structural integrity, highlighting the challenges in multimodal code generation.

Key Contribution

Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.

Abstract

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Related Papers