Xiaomi AI LabApr 7, 2026arXiv:2604.05839

Vision-Guided Iterative Refinement for Frontend Code Generation

Hannah Sansford, Derek H. C. Law, Wei Liu, Abhishek Tripathi, Niresh Agarwal, Gerrit J. J. van den Burg

AI Summary

This paper introduces a fully automated critic-in-the-loop framework for frontend code generation, using a vision-language model (VLM) to provide structured feedback on rendered webpages. Iterative refinement guided by the VLM critic improves solution quality by up to 17.8% over three cycles on the WebDev Arena dataset. Parameter-efficient fine-tuning with LoRA allows the code-generating LLM to internalize 25% of the critic's gains, demonstrating the effectiveness of automated visual critique for complex visual outputs.

Key Contribution

Stop wasting human time on frontend code refinement: a VLM-powered critic can automatically guide LLMs to generate significantly better web UIs.

Abstract

Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly - particularly in domains such as frontend web development where the solution quality depends on rendered visual output. We present a fully automated critic-in-the-loop framework in which a vision-language model serves as a visual critic that provides structured feedback on rendered webpages to guide iterative refinement of generated code. Across real-world user requests from the WebDev Arena dataset, this approach yields consistent improvements in solution quality, achieving up to 17.8% increase in performance over three refinement cycles. Next, we investigate parameter-efficient fine-tuning using LoRA to understand whether the improvements provided by the critic can be internalized by the code-generating LLM. Fine-tuning achieves 25% of the gains from the best critic-in-the-loop solution without a significant increase in token counts. Our findings indicate that automated, VLM-based critique of frontend code generation leads to significantly higher quality solutions than can be achieved through a single LLM inference pass, and highlight the importance of iterative refinement for the complex visual outputs associated with web development.

Code Generation & Program Synthesis Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Vision-Guided Iterative Refinement for Frontend Code Generation

Related Papers