Search papers, labs, and topics across Lattice.
This paper introduces a novel functional gradient descent (FGD) algorithm that adapts the representation of functional gradients during optimization, addressing the challenges posed by infinite-dimensional gradients in traditional FGD implementations. By incorporating approximation errors into the theoretical analysis, the authors establish convergence guarantees to stationary points and global minimizers under specific conditions. The proposed method demonstrates superior efficiency and accuracy across various applications, including regression, PDE solutions, and computer vision, outperforming both fixed approximation FGD and neural network baselines.
Adaptive representations in functional gradient descent can achieve global convergence guarantees while significantly enhancing optimization efficiency and accuracy.
Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.