Search papers, labs, and topics across Lattice.
This paper introduces a framework to jointly evaluate the representation and generative capabilities of diffusion models by leveraging self-supervised learning principles. The authors decompose features into invariant and residual components, deriving the Invariant Contamination Ratio (ICR) to quantify the contamination of invariant signals by residual variations. Key findings reveal that invariance peaks at intermediate noise levels, correlating with optimal classification performance, while ICR serves as an early indicator of the transition from generalization to memorization during training.
Invariance in diffusion models peaks at intermediate noise levels, revealing a critical link between representation quality and classification performance.
Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we introduce a framework for jointly evaluating the representation and generation capabilities of diffusion models. Specifically, we decompose features into invariant and residual components and derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that quantifies how residual variation contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior of diffusion models. On the representation side, we find that invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and show that ICR serves as a sensitive training-time indicator of early learning: increasing residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show that diffusion models can be monitored from a self-supervised perspective through the geometry of their learned representations.