Ohio StateUMichJun 8, 2026arXiv:2606.09718

Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles

Xiao Li, Yixuan Jia, Zekai Zhang, Xiang Li, Lianghe Shi, Jinxin Zhou, Zhihui Zhu, Liyue Shen, Qing Qu

AI Summary

This paper introduces a framework to jointly evaluate the representation and generative capabilities of diffusion models by leveraging self-supervised learning principles. The authors decompose features into invariant and residual components, deriving the Invariant Contamination Ratio (ICR) to quantify the contamination of invariant signals by residual variations. Key findings reveal that invariance peaks at intermediate noise levels, correlating with optimal classification performance, while ICR serves as an early indicator of the transition from generalization to memorization during training.

Key Contribution

Invariance in diffusion models peaks at intermediate noise levels, revealing a critical link between representation quality and classification performance.

Abstract

Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we introduce a framework for jointly evaluating the representation and generation capabilities of diffusion models. Specifically, we decompose features into invariant and residual components and derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that quantifies how residual variation contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior of diffusion models. On the representation side, we find that invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and show that ICR serves as a sensitive training-time indicator of early learning: increasing residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show that diffusion models can be monitored from a self-supervised perspective through the geometry of their learned representations.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles

Related Papers