Tsinghua AIJun 7, 2026arXiv:2606.08810

Continuous Language Diffusion as a Decoder-Interface Problem

AI Summary

This paper investigates the relationship between Gaussian-corrupted sentence embeddings and continuous diffusion language models, revealing a decoder-basin mechanism that allows for fluent text generation. Through the introduction of Embedded Language Flows (ELF) and a new diagnostic protocol, the authors expose critical failures in traditional evaluation metrics, demonstrating that low mean-squared error can obscure linguistic content and that token recovery is influenced by decoder sensitivity rather than just latent error. The findings indicate that effective denoising and token realization depend on navigating a high-margin final-token basin, achieving up to 97.9% agreement with native decoder decisions using a linear readout approach.

Key Contribution

Token recovery in continuous language diffusion models hinges on navigating a high-margin basin, revealing hidden failures in traditional evaluation metrics.

Abstract

Gaussian-corrupted sentence embeddings have no direct linguistic interpretation, yet continuous diffusion language models can generate fluent text from them. We study this puzzle through Embedded Language Flows (ELF) and identify a decoder-basin mechanism: denoising succeeds when trajectories reach regions where the native decoder can read stable tokens. We introduce a diagnostic protocol for denoisability, semantic recoverability, order sensitivity, decoder compatibility, and trajectory reliability. It exposes failures hidden by scalar metrics: low mean-squared error can discard linguistic content, low perplexity can reflect low-entropy collapse, and clean latent reconstruction can coexist with a narrow decoder basin. A decoder-margin bound explains why token recovery depends on margin and local decoder sensitivity, not latent error alone. Auditing public ELF checkpoints reveals an interface phase diagram: early predictions are weakly readable, mid-trajectory disagreement marks a competition region, and late predictions enter a high-margin final-token basin. Once inside, token realization is surprisingly simple on generated ELF states: frozen T5 token-embedding lookup recovers $93$--$96\%$ of native decoder decisions, and a single linear readout reaches $97.9\%$ agreement at 32k samples, leaving about a 1.1 perplexity gap in a structured residual tail. A conservative margin gate exits $17$--$27\%$ earlier in denoising steps under an explicit diagnostic monitor. Boundary checks on LangFlow, BitstreamDiffusion, and the Continuous Latent Diffusion Language Model (Cola-DLM) show that the same interface questions remain meaningful when the state object and decoder change. Continuous and latent diffusion language models should therefore be evaluated as representation-decoder systems.

Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Continuous Language Diffusion as a Decoder-Interface Problem

Related Papers