B training tokensD Gaussian primitives. Combined with aPrincetonTogetherUIUCUT AustinApr 13, 2026arXiv:2604.11035

Introspective Diffusion Language Models

Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu

AI Summary

This paper identifies a key weakness in Diffusion Language Models (DLMs): they lack introspective consistency, meaning they often disagree with their own generated tokens, unlike Autoregressive (AR) models. To address this, the authors introduce Introspective Diffusion Language Model (I-DLM) with a novel introspective strided decoding (ISD) algorithm that allows the model to verify previously generated tokens while generating new ones in parallel. I-DLM achieves comparable quality to same-scale AR models and significantly outperforms previous DLMs in both quality and serving efficiency across 15 benchmarks, demonstrating a substantial advance in DLM capabilities.

Key Contribution

Diffusion language models can now match autoregressive quality, thanks to a clever trick that forces them to agree with themselves.

Abstract

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Introspective Diffusion Language Models

Related Papers