Stanford HAIHKUJun 8, 2026arXiv:2606.09159

Unified Energy for Invariant and Independent Decoding in Diffusion Language Models

Yuchen Yan, Minkai Xu, Zaiquan Yang, Yatao Bian

AI Summary

This paper systematically analyzes the performance gap between Diffusion Language Models (DLMs) and auto-regressive (AR) baselines, identifying key factors such as model capacity, dependency, and invariance that contribute to this disparity. To address these issues, the authors introduce a unified energy framework that combines invariant energy (Inv-E) and independent energy (Ind-E), enabling exact computation without the need for sampling-based partition estimation. Extensive experiments validate that this unified energy (Uni-E) effectively corrects distribution shifts and enhances the performance of DLMs and Diffusion Large Language Models (DLLMs) across various tasks.

Key Contribution

A novel unified energy framework that corrects distribution shifts in diffusion models, outperforming traditional auto-regressive methods.

Abstract

Diffusion Language Models (DLMs) enable parallel text generation by iteratively denoising a full sequence, offering attractive flexibility compared to auto-regressive (AR) decoding. However, existing methods fail to fully capture token relationships, leading to a performance gap relative to AR baselines, especially as the degree of parallelism increases. In this paper, we give a systematic analysis of the gap, identifying three key factors: (i) model capacity, (ii) dependency, and (iii) invariance. To address these issues, we first propose an invariant energy (Inv-E) together with an effective sampling-based estimator to handle the invariance issue. By further combining with the independent energy (Ind-E), we obtain a unified energy (Uni-E), that accounts for all these factors. Uni-E enjoys a unique advantage: it can be computed exactly without sampling-based partition estimation. Besides, Uni-E is model agnostic and can therefore be scaled to models of arbitrary size. We further prove that Uni-E can correct the distribution shift caused by dependency and invariance. Extensive experiments across Diffusion Language Models (DLMs) and Diffusion Large Language Models (DLLMs) demonstrate the effectiveness of the proposed Uni-E.

Natural Language Processing Scaling Laws & Emergent Abilities

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Unified Energy for Invariant and Independent Decoding in Diffusion Language Models

Related Papers