Search papers, labs, and topics across Lattice.
This paper introduces Recursive Masked Diffusion Models (R-MDMs), which enhance the scaling of masked diffusion models by incorporating recursive depth as an additional scaling dimension. By reapplying the same denoising transformer within each diffusion step, R-MDMs achieve significant improvements in parameter efficiency, allowing models with recursive iterations to match the performance of larger non-recursive models. The findings demonstrate that recursive refinement can reduce the number of required denoising steps, optimizing both model performance and computational resources during inference.
Recursive depth in masked diffusion models can dramatically enhance parameter efficiency, enabling models to perform as well as much larger counterparts without the added computational burden.
Masked diffusion models (MDMs) have recently emerged as a promising paradigm for sequence generation. Scaling MDMs is conventionally achieved by increasing the parameter count or the number of denoising steps. We introduce Recursive Masked Diffusion Models (R-MDMs), which add recursive depth as a third scaling axis by repeatedly applying the same denoising transformer within each diffusion step. Recursion enables iterative refinement of the output through parameter reuse, increasing effective model depth without increasing parameter count. Across structured generation tasks, including Sudoku and Countdown, we show that R-MDMs achieve substantially improved parameter efficiency: a model with $L$ recursive iterations often matches the performance of non-recursive baselines with roughly $L\times$ more parameters. Moreover, recursive refinement can partially substitute for additional denoising steps, allowing recursive models to reach the same generation quality with fewer forward passes at inference time. These results suggest that recursive depth is a practically useful scaling mechanism for MDMs, improving both parameter efficiency and the allocation of test-time compute.