Jan 13, 2026arXiv:2601.08095

From Prompts to Deployment: Auto-Curated Domain-Specific Dataset Generation via Diffusion Models

AI Summary

This paper introduces a three-stage pipeline for automated domain-specific dataset generation using diffusion models to mitigate distribution shift challenges in real-world deployment. The pipeline leverages controlled inpainting to synthesize objects within specific backgrounds, validates outputs using multi-modal assessment (object detection, aesthetic scoring, vision-language alignment), and incorporates a user-preference classifier. The framework enables efficient creation of high-quality datasets tailored to deployment environments, reducing the need for extensive real-world data.

Key Contribution

Forget painstakingly collecting real-world data – this pipeline auto-curates high-quality, domain-specific datasets using diffusion models and multi-modal validation, ready for deployment.

Abstract

In this paper, we present an automated pipeline for generating domain-specific synthetic datasets with diffusion models, addressing the distribution shift between pre-trained models and real-world deployment environments. Our three-stage framework first synthesizes target objects within domain-specific backgrounds through controlled inpainting. The generated outputs are then validated via a multi-modal assessment that integrates object detection, aesthetic scoring, and vision-language alignment. Finally, a user-preference classifier is employed to capture subjective selection criteria. This pipeline enables the efficient construction of high-quality, deployable datasets while reducing reliance on extensive real-world data collection.

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Prompts to Deployment: Auto-Curated Domain-Specific Dataset Generation via Diffusion Models

Related Papers