Mar 9, 2026arXiv:2603.08069

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

AI Summary

This paper explores using multimodal large language models (MLLMs) to generate synthetic defect images for training power line insulator defect classifiers, addressing the challenge of data scarcity in this domain. They use dual-reference conditioning and prompt refinement to improve the diversity and fidelity of the generated images, and then filter the synthetic images using an embedding-based selection rule. Augmenting a small real training set with these synthetic images improves the test F1 score by 20%, demonstrating a significant data-efficiency gain.

Key Contribution

MLLMs can generate surprisingly effective synthetic training data for defect classification, boosting performance by 20% even with very limited real data.

Abstract

Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

Related Papers