Search papers, labs, and topics across Lattice.
The paper introduces Dependency-Oriented Sampler (DOS), a training-free decoding strategy for masked diffusion language models (MDLMs) that incorporates inter-token dependencies during generation. DOS uses attention matrices from transformer blocks to approximate these dependencies, prioritizing information from unmasked tokens when updating masked positions. Experiments on code generation and mathematical reasoning show that DOS improves performance and can be integrated with existing parallel sampling methods for better efficiency.
MDLMs can be significantly improved *without* retraining by using attention weights to guide sampling based on inter-token dependencies.
Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning tasks. Moreover, DOS can be seamlessly integrated with existing parallel sampling methods, leading to improved generation efficiency without sacrificing generation quality.