Tsinghua AIUSTCMar 26, 2026arXiv:2603.25804

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Jiajun Zhang, Yuying Li, Zhixun Li, Xin Guo, Jingzhu Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Jun Wu, Zilei Wang, Qiang Liu, Liang Wang

AI Summary

The authors introduce RealChart2Code, a new benchmark with 2,800 instances to evaluate VLMs on generating code for complex, multi-panel charts from real-world data. They assessed 14 leading VLMs, finding significant performance degradation compared to simpler benchmarks, especially with intricate plots and authentic data. The benchmark also uniquely evaluates iterative code refinement in a multi-turn conversational setting, revealing limitations even in state-of-the-art VLMs.

Key Contribution

VLMs still struggle to generate code for complex, multi-panel charts from real-world data, despite excelling on simpler benchmarks.

Abstract

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on \texttt{RealChart2Code} reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at \url{https://github.com/Speakn0w/RealChart2Code}.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Related Papers