Search papers, labs, and topics across Lattice.
The paper introduces Dual-Solver, a novel ODE solver for diffusion models that uses learnable parameters to continuously interpolate prediction types, select the integration domain, and adjust residual terms within a predictor-corrector framework. These parameters are optimized using a classification-based objective with a frozen pretrained classifier. Dual-Solver achieves improved FID and CLIP scores in low-NFE regimes (3-9) across various diffusion models like DiT, GM-DiT, SANA, and PixArt-α, demonstrating its effectiveness in reducing sampling costs while maintaining image quality.
Dramatically cut diffusion model sampling costs without sacrificing image quality by learning the optimal way to integrate ODEs.
Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation (SANA, PixArt-$α$), Dual-Solver improves FID and CLIP scores in the low-NFE regime ($3 \le$ NFE $\le 9$) across backbones.