Search papers, labs, and topics across Lattice.
This paper introduces a system for generating synthetic training data for object detection and segmentation tasks in autonomous driving using Stable Diffusion, CVAT, SAM, and BLIP. The system generates photorealistic images, annotates them automatically, and creates descriptive product captions. The resulting synthetic dataset is used to train a YOLOv8 model, achieving performance comparable to or exceeding that of models trained on real-world data.
Synthetic data generated via Stable Diffusion and SAM can match or exceed the performance of real-world data for training YOLOv8 object detection models in autonomous driving scenarios.
Autonomous driving object segmentation and detection training models generally rely on real-world data, which is costly to annotate, difficult to acquire, and heterogeneous. We present a new system based on deep learning and generative AI methods for producing high-quality synthetic data with the goal of overcoming these challenges. Our system produces photorealistic synthetic images using Stable Diffusion models and annotates them using the CVAT annotation tool and Segment Anything Model (SAM). We divide the obtained data into training sets and validation subsets upon pre-cleaning the given annotations into corresponding segmentation masks. We employ this dataset to train YOLOv8 on object detection and segmentation tasks in such a manner that we are able to check the quality of our produced fake data. In addition, we deploy the BLIP image-captioning feature on Salesforce to produce rich information-based descriptive product captions. Our models trained on our synthetic data are better, or even better, than models trained on real data, according to our experimental test. The test proves the huge potential of synthetic datasets as an economically sustainable and scalable way to train perception systems for autonomous vehicles, particularly for perceiving challenging or exotic driving situations.