Search papers, labs, and topics across Lattice.
This paper introduces StreamSynth, a novel setting where LLMs learn to synthesize data from a stream of sequential tasks, leveraging past experiences for future synthesis. They propose SynLearner, a framework that encourages exploration of diverse synthesis patterns, learning from feedback, and balancing sample quality with set-level diversity. Experiments across multiple benchmarks demonstrate that SynLearner effectively transfers knowledge from earlier tasks to improve synthesis performance on later ones, proving the feasibility of experience-driven synthetic data generation.
LLMs can learn to synthesize data more effectively by accumulating and transferring experience across a stream of sequential synthesis tasks, opening the door to more efficient and adaptable synthetic data generation.
Large language models (LLMs) have been widely adopted for synthetic data generation, significantly reducing annotation costs. However, most existing studies treat synthesis as a set of isolated tasks and overlook a more fundamental question: whether a model can learn to synthesize by accumulating experience from past tasks and transferring it to future ones. In this work, we introduce StreamSynth, a new setting in which synthesis tasks arrive sequentially and experience from historical tasks provides informative signals for future synthesis. To address this setting, we propose SynLearner, a general framework that enables synthesis models to acquire reusable synthesis experience over a task stream. Instead of generating data independently for each task, SynLearner encourages the model to explore diverse synthesis patterns, learn from feedback, and balance sample quality with set-level diversity as tasks evolve. Extensive experiments across multiple benchmarks show that SynLearner effectively leverages experience from earlier tasks to improve synthesis performance on later ones, exhibiting consistent cross-task transferability. These findings provide evidence for the feasibility of StreamSynth and highlight synthetic data generation as an experience-driven process that can benefit from task streams.