Search papers, labs, and topics across Lattice.
The paper introduces a novel approach to enhance the realism of synthetic images for autonomous vehicle applications by using ControlNet-based diffusion models with upscaling. They simultaneously leverage multiple ControlNet signals (edge, depth, segmentation, tile resampling) and text-guided prompts from LLMs to control the generative process and bridge the domain gap between synthetic and real-world data. Experiments on the VKITTI dataset, validated with FID, SSIM, LPIPS, and YOLO-v8 object detection/classification, demonstrate the effectiveness of the method in improving synthetic data realism and usability.
By combining ControlNet signals with LLM-extracted text prompts, synthetic data can be made significantly more realistic, bridging the domain gap for autonomous driving applications.
In this work, we present an innovative approach utilizing ControlNet-based diffusion models along with upscaling capabilities for domain adaptation and quality refinement of 3D modelled synthetic datasets, focusing on autonomous vehicle applications. A significant domain gap often exists between synthetic and real-world data, hindering the applicability of deep learning models trained on synthetic data for real-world scenarios. Our methodology leverages the strengths of Controlled Augmentation by simultaneously utilizing multiple ControlNet signals, including edge detection, depth information, segmentation maps, and tile resampling. To improve how synthetic data aligns with the desired domain specifications, these signals guide the generative process, and we also incorporate text-guided prompts extracted via Large Language Models (LLMs), to improve control over the synthesis of desired features and attributes. We test the approach on diverse environmental conditions from the VKITTI dataset, a well-known 3D modelled synthetic dataset generated in Unity for autonomous driving research. The refined data is validated using quantitative metrics including FID, SSIM, and LPIPS, and is also evaluated on downstream machine learning tasks of object detection and classification, using YOLO-v8 to ensure its utility and effectiveness. Experimental analysis demonstrates the effectiveness of this method in improving the realism and usability of synthetic data. Our approach contributes to fields that require high-quality data synthesis and domain adaptation. The experimental work, along with ControlNet models used in this project is available online.1