Search papers, labs, and topics across Lattice.
This paper explores using multimodal large language models (MLLMs) to generate synthetic defect images for training power line insulator defect classifiers, addressing the challenge of data scarcity in this domain. They use dual-reference conditioning and prompt refinement to improve the diversity and fidelity of the generated images, and then filter the synthetic images using an embedding-based selection rule. Augmenting a small real training set with these synthetic images improves the test F1 score by 20%, demonstrating a significant data-efficiency gain.
MLLMs can generate surprisingly effective synthetic training data for defect classification, boosting performance by 20% even with very limited real data.
Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.