CambridgeJun 4, 2026arXiv:2606.06179

Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

AI Summary

This paper critiques the conventional reliance on $L^2$ score matching error for training score-based diffusion models, revealing that this metric can misrepresent the model's ability to match target distributions. By employing a Helmholtz-Hodge decomposition, the authors distinguish between gradient and solenoidal components of score errors, demonstrating that only the gradient affects the marginal Fokker-Planck dynamics. Their findings include an impossibility result regarding the uniform lower bounding of divergences by monotonic functions of the $L^2$ score error and a new estimator for the gradient component that correlates better with sample quality than traditional metrics.

Key Contribution

A learned diffusion model can achieve a perfect match to the target distribution while exhibiting an arbitrarily large $L^2$ score error, challenging existing training paradigms.

Abstract

Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.

Computer Vision

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

Related Papers