Apr 5, 2026arXiv:2604.04215

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Jingyi Yang, Yuxian Jiang, Xuhao Hu, Shuang Cheng, Biqing Qi, Jing Shao

AI Summary

The paper introduces DARE, an open-source framework for post-training and evaluating diffusion large language models (dLLMs), unifying supervised fine-tuning, preference optimization, and dLLM-specific reinforcement learning. DARE builds on verl and OpenCompass to provide a shared execution stack for both masked and block diffusion language models, addressing the fragmented ecosystem of dLLM post-training pipelines. Empirical results across model families like LLaDA and Dream demonstrate DARE's utility as a reusable research substrate for developing, comparing, and deploying post-training methods for dLLMs.

Key Contribution

Stop reinventing the wheel for diffusion LLM alignment: DARE provides a unified framework for SFT, preference optimization, and RL, accelerating research and enabling fair comparisons across dLLMs.

Abstract

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model families and, in particular, across post-training pipelines, where reinforcement learning objectives, rollout implementations and evaluation scripts are often released as paper-specific codebases. This fragmentation slows research iteration, raises the engineering burden of reproduction, and makes fair comparison across algorithms difficult. We present \textbf{DARE} (\textbf{d}LLMs \textbf{A}lignment and \textbf{R}einforcement \textbf{E}xecutor), an open framework for post-training and evaluating dLLMs. Built on top of verl~\cite{sheng2024hybridflow} and OpenCompass~\cite{2023opencompass}, DARE unifies supervised fine-tuning, parameter-efficient fine-tuning, preference optimization, and dLLM-specific reinforcement learning under a shared execution stack for both masked and block diffusion language models. Across representative model families including LLaDA, Dream, SDAR, and LLaDA2.x, DARE provides broad algorithmic coverage, reproducible benchmark evaluation, and practical acceleration. Extensive empirical results position that DARE serves as a reusable research substrate for developing, comparing, and deploying post-training methods for current and emerging dLLMs.

Open-Source Models & Weights RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Related Papers