Feb 24, 2026arXiv:2602.21185

The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum

Justin Deschenaux, Justin Deschenaux, Caglar Gulcehre, Caglar Gulcehre, Subham Sahoo, Subham Sekhar Sahoo

AI Summary

The paper introduces a novel family of Predictor-Corrector (PC) samplers, termed $\Psi$-samplers, for discrete diffusion models that generalize existing methods and are applicable to arbitrary noise processes. These samplers, when used with uniform-state diffusion, surpass ancestral sampling in language and image modeling tasks, demonstrating improved generative perplexity and FID/IS scores, respectively. Furthermore, the PC samplers exhibit continued performance improvement with increasing sampling steps, challenging the dominance of Masked diffusion in language modeling and the paper also introduces a memory-efficient curriculum for the Gaussian relaxation training phase.

Key Contribution

Uniform-state diffusion models can now achieve state-of-the-art generative performance in language modeling, thanks to a new Predictor-Corrector sampler that breaks the quality plateau of ancestral sampling.

Abstract

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References74

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum

Related Papers