Feb 25, 2026arXiv:2602.21596

A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers

Trung X. Pham, Kang Zhang, Ji Woo Hong, Chang D. Yoo

AI Summary

This paper investigates the structure of class-conditional embeddings in Diffusion Transformers, revealing a high degree of angular similarity between different class embeddings and a concentration of semantic information in a small subset of dimensions. The authors find that class-conditioned embeddings on ImageNet-1K exhibit over 99% angular similarity, while continuous-condition tasks reach over 99.9%. They demonstrate that pruning low-magnitude dimensions in the embedding space has minimal impact on generation quality, suggesting a semantic bottleneck.

Key Contribution

Diffusion Transformers waste up to 66% of their conditional embedding space without sacrificing generation quality, hinting at opportunities for more efficient conditioning.

Abstract

Diffusion Transformers have achieved state-of-the-art performance in class-conditional and multimodal generation, yet the structure of their learned conditional embeddings remains poorly understood. In this work, we present the first systematic study of these embeddings and uncover a notable redundancy: class-conditioned embeddings exhibit extreme angular similarity, exceeding 99\% on ImageNet-1K, while continuous-condition tasks such as pose-guided image generation and video-to-audio generation reach over 99.9\%. We further find that semantic information is concentrated in a small subset of dimensions, with head dimensions carrying the dominant signal and tail dimensions contributing minimally. By pruning low-magnitude dimensions--removing up to two-thirds of the embedding space--we show that generation quality and fidelity remain largely unaffected, and in some cases improve. These results reveal a semantic bottleneck in Transformer-based diffusion models, providing new insights into how semantics are encoded and suggesting opportunities for more efficient conditioning mechanisms.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers

Related Papers