PolyUMar 16, 2026arXiv:2603.15340

DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models

AI Summary

The paper introduces Dependency-Oriented Sampler (DOS), a training-free decoding strategy for masked diffusion language models (MDLMs) that incorporates inter-token dependencies during generation. DOS uses attention matrices from transformer blocks to approximate these dependencies, prioritizing information from unmasked tokens when updating masked positions. Experiments on code generation and mathematical reasoning show that DOS improves performance and can be integrated with existing parallel sampling methods for better efficiency.

Key Contribution

MDLMs can be significantly improved *without* retraining by using attention weights to guide sampling based on inter-token dependencies.

Abstract

Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning tasks. Moreover, DOS can be seamlessly integrated with existing parallel sampling methods, leading to improved generation efficiency without sacrificing generation quality.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models

Related Papers