Search papers, labs, and topics across Lattice.
This paper reviews the use of generative models for data augmentation in computer vision, focusing on techniques that leverage enhanced conditioning mechanisms to generate both images and labels. It highlights the critical issue of data leakage when using large pre-trained models like Stable Diffusion and proposes controllable sampling strategies to mitigate this. The review also discusses quality assessment procedures for generated images, such as FID, CLIP-score, and IoU, and suggests future research directions including data-scarce domains and improved sampling strategies.
Generative data augmentation can leak information from pretraining sets or sample from the wrong distribution in few-shot learning, so be careful.
Modern generative models provide high quality of generation, with their usage considered alongside classic data augmentation techniques. The paper provides a review of existing approaches for data augmentation with generative models in computer vision tasks. Reviewed pipelines utilize enhanced conditioning mechanisms of modern generative models to produce both images and labels for various tasks including image captioning, classification, object detection and segmentation.Data leak prevention is crucial when large pretrained generative models are used for augmentation. Models like Stable Diffusion were trained on billions of publicly available images, which might also be present in popular datasets. A potential methodological weakness in the application to few-shot learning tasks was identified: images generated based on textual prompts are sampled from all possible images of a certain class, but not only from a few given training examples. Controllable sampling should be introduced to prevent possible data leaks.Despite the high overall quality of generated images, novel diffusional models are still error-prone in the generation of complex scenes. To mitigate this various quality assessment procedures for generated images are used. These methods include visual naturalness evaluation with Fréchet Inception Distance (FID), prompt correspondence control via CLIP-score, intersection-over-union (IoU) with ground truth or predictions of auxiliary segmentation models for segmentation masks.Further utilization of generative augmentations in data-scarce domains such as medical imaging is needed. To achieve this, it is preferable to eliminate auxiliary predictors from generation quality assessment. It is proposed to compare generated and natural segmentation masks with the FID-score.Another area for further research is data augmentation sampling strategies that are less dependent on specific generative pipelines and therefore can be considered separately. Targeted augmentation of hard to predict examples is more effective than uniform sampling. It is planned to improve sampling strategies from reviewed papers in future research.