KAISTFeb 18, 2026arXiv:2602.16813

One-step Language Modeling via Continuous Denoising

Chanhyuk Lee, Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Manan Agarwal, Sheel Shah, Sheel Shah, Jerry Huang, Jerry Huang, Aditi Raghunathan, Aditi Raghunathan, Seunghoo Hong, Seunghoon Hong, N. Boffi, Nicholas M. Boffi, Jinwoo Kim, Jinwoo Kim

AI Summary

The paper introduces a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings, challenging the necessity of discrete diffusion for discrete data generation. FLM is trained using a cross-entropy objective with a novel time reparameterization for improved stability and quality. By distilling FLM into a distilled flow map language model (FMLM), the authors achieve state-of-the-art few-step generation, surpassing discrete diffusion models in both quality and speed on LM1B and OWT datasets.

Key Contribution

Forget slow, multi-step diffusion: this work achieves state-of-the-art text generation quality with a *single* denoising step using flow-based language models.

Abstract

Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality matching state-of-the-art discrete diffusion models. With FMLM, our approach outperforms recent few-step language models across the board, with one-step generation exceeding their 8-step quality. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale. Code is available at https://github.com/david3684/flm.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References79

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

One-step Language Modeling via Continuous Denoising

Related Papers