Search papers, labs, and topics across Lattice.
This paper provides a theoretical analysis of the implicit regularization induced by Deep Linear Discriminant Analysis (Deep LDA), a scale-invariant objective that minimizes intraclass variance and maximizes interclass distance. The analysis focuses on the gradient flow of the Deep LDA loss on an L-layer diagonal linear network. The key result demonstrates that under balanced initialization, the network transforms additive gradient updates into multiplicative weight updates, leading to automatic conservation of the (2/L) quasi-norm.
Deep LDA implicitly conserves a (2/L) quasi-norm in deep linear networks, revealing a novel form of implicit regularization in discriminative metric learning.
While the Implicit Bias(or Implicit Regularization) of standard loss functions has been studied, the optimization geometry induced by discriminative metric-learning objectives remains largely unexplored.To the best of our knowledge, this paper presents an initial theoretical analysis of the implicit regularization induced by the Deep LDA,a scale invariant objective designed to minimize intraclass variance and maximize interclass distance. By analyzing the gradient flow of the loss on a L-layer diagonal linear network, we prove that under balanced initialization, the network architecture transforms standard additive gradient updates into multiplicative weight updates, which demonstrates an automatic conservation of the (2/L) quasi-norm.