East China University of Science and TechnologyShanghai Huahong Grace SemiconductorApr 30, 2026arXiv:2604.27629

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

AI Summary

WaferSAGE is introduced, a framework that uses a 4B-parameter vision-language model to perform wafer defect visual question answering, addressing the challenge of data scarcity via a three-stage synthetic data generation pipeline guided by structured rubrics. The pipeline leverages clustering-based cleaning, vision-language model-generated defect descriptions, and rubric-guided VQA pair synthesis. Curriculum-based reinforcement learning with Group Sequence Policy Optimization (GSPO) and rubric-aligned rewards allows the Qwen3-VL model to achieve performance approaching Gemini-3-Flash, while enabling on-premise deployment.

Key Contribution

A 4B-parameter model, trained on synthetically generated data and rubric-guided reinforcement learning, nearly matches the performance of Gemini-3-Flash on wafer defect analysis, proving that smaller, domain-specific models can rival proprietary LLMs in specialized industrial tasks.

Abstract

We present WaferSAGE, a framework for wafer defect visual question answering using small vision-language models. To address data scarcity in semiconductor manufacturing, we propose a three-stage synthesis pipeline incorporating structured rubric generation for precise evaluation. Starting from limited labeled wafer maps, we employ clustering-based cleaning to filter label noise, then generate comprehensive defect descriptions using vision-language models, which are converted into structured evaluation rubrics criteria. These rubrics guide the synthesis of VQA pairs, ensuring coverage across defect type identification, spatial distribution, morphology, and root cause analysis. Our dual assessment framework aligns rule-based metrics with LLM-Judge scores via Bayesian optimization, enabling reliable automated evaluation. Through curriculum-based reinforcement learning with Group Sequence Policy Optimization (GSPO) and rubric-aligned rewards, our 4B-parameter Qwen3-VL model achieves a 6.493 LLM-Judge score, closely approaching Gemini-3-Flash (7.149) while enabling complete on-premise deployment. We demonstrate that small models with domain-specific training can surpass proprietary large models in specialized industrial visual understanding, offering a viable path for privacy-preserving, cost-effective deployment in semiconductor manufacturing.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

Related Papers