Search papers, labs, and topics across Lattice.
This paper introduces a two-step approach for pre-training deep learning models using synthetic data to mitigate the need for large real image datasets. First, the authors propose an improved neural fractal formulation to generate a new class of synthetic data. Second, they introduce reverse stylization, a technique to transfer visual features from real images onto the synthetic data. Experiments show that pre-training models on the resulting synthetic dataset reduces the domain gap, leading to improved performance in image generation, data representation, and image classification tasks.
Forget GANs, this new method uses stylized fractals to generate synthetic training data that closes the reality gap and boosts ImageNet-100 accuracy by 10% when pre-training ViT-S.
Modern deep learning models in computer vision require large datasets of real images, which are difficult to curate and pose privacy and legal concerns, limiting their commercial use. Recent works suggest synthetic data as an alternative, yet models trained with it often underperform. This paper proposes a two-step approach to bridge this gap. First, we propose an improved neural fractal formulation through which we introduce a new class of synthetic data. Second, we propose reverse stylization, a technique that transfers visual features from a small, license-free set of real images onto synthetic datasets, enhancing their effectiveness. We analyze the domain gap between our synthetic datasets and real images using Kernel Inception Distance (KID) and show that our method achieves a significantly lower distributional gap compared to existing synthetic datasets. Furthermore, our experiments across different tasks demonstrate the practical impact of this reduced gap. We show that pretraining the EDM2 diffusion model on our synthetic dataset leads to an 11% reduction in FID during image generation, compared to models trained on existing synthetic datasets, and a 20% decrease in autoencoder reconstruction error, indicating improved performance in data representation. Furthermore, a ViT-S model trained for classification on this synthetic data achieves over a 10% improvement in ImageNet-100 accuracy. Our work opens up exciting possibilities for training practical models when sufficiently large real training sets are not available.