Search papers, labs, and topics across Lattice.
The paper investigates the impact of combinatorial complexity in diffusion models, arguing that insufficient coverage of the combined dimension and attribute space during training limits performance. To address this, they introduce ComboStoc, a training scheme that leverages stochastic processes to better explore combinatorial structures. Experiments across image and 3D shape generation demonstrate accelerated training and a novel asynchronous generation method offering dimension-specific control.
Diffusion models can be sped up and gain finer-grained control by explicitly accounting for the combinatorial structure of data dimensions and attributes during training.
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes can be insufficiently covered by existing training schemes of diffusion generative models, potentially limiting test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses asynchronous time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. Our code is available at: https://github.com/Xrvitd/ComboStoc