Search papers, labs, and topics across Lattice.
This paper identifies a fundamental KL identity for exponential families, expressing the difference in KL divergences between a distribution and two members of the family in terms of the log-partition function and the moment of the distribution. By leveraging this identity and the non-negativity of KL divergence, the authors derive a suite of classical results, including Pythagorean theorems, convexity of the log-partition function, and the Gibbs variational principle, typically requiring separate, more complex derivations. This unified framework also recovers the explicit optimizer in KL-regularized reward maximization, connecting it to exponential tilting in entropy-regularized control and RLHF.
A single KL identity unlocks a surprisingly simple and unified derivation of core results for exponential families, streamlining the theoretical foundations of variational inference, entropy-regularized RL, and RLHF.
Exponential families encompass the distributions central to modern machine learning -- softmax, Gaussians, and Boltzmann distributions -- and underlie the theory of variational inference, entropy-regularized reinforcement learning, and RLHF. We isolate a simple identity for exponential families that expresses the KL difference $\mathrm{KL}(q \| p_{\lambda_2}) - \mathrm{KL}(q \| p_{\lambda_1})$ in terms of the log-partition function $A(\lambda)$ and the moment $\mu_q$. Remarkably, this identity together with the single fact that $\mathrm{KL} \geq 0$ (with equality iff $p = q$) suffices, by direct substitution and rearrangement, to derive a cluster of results that are classically obtained by separate, heavier arguments: a generalized three-point identity for arbitrary reference distributions, Pythagorean theorems for I-projections and reverse I-projections, convexity of the log-partition function, identification of its Legendre dual in KL terms, the Gibbs variational principle, and the explicit optimizer in KL-regularized reward maximization, including the exponential tilting formula underlying entropy-regularized control and RLHF. Beyond these purely algebraic consequences, standard analytic arguments recover the gradient formula for the log-partition function, the Bregman representation of within-family KL divergence, and the surjectivity of the moment map. The note is self-contained.