mm-webagentApr 16, 2026arXiv:2604.15309

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Qiuchao Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

AI Summary

The paper introduces MM-WebAgent, a hierarchical agentic framework that leverages AIGC tools for multimodal webpage generation by coordinating element creation through hierarchical planning and self-reflection. This approach addresses the style inconsistency and poor global coherence issues that arise when AIGC tools are directly integrated into automated webpage generation. Experiments show that MM-WebAgent outperforms existing code-generation and agent-based methods, particularly in multimodal element generation and integration, as validated by a new benchmark and multi-level evaluation protocol.

Key Contribution

Hierarchical planning and self-reflection can make AI-generated webpages coherent and visually consistent, even when using diverse AIGC tools.

Abstract

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code&Data: https://aka.ms/mm-webagent.

Code Generation & Program Synthesis Multimodal Models Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Related Papers