Search papers, labs, and topics across Lattice.
This paper analyzes federated learning with delayed stochastic gradients, demonstrating that a pre-chosen diminishing step size achieves optimal SGD convergence rates for both nonconvex and strongly convex objectives, matching the performance of delay-adaptive step size methods. The analysis considers potentially biased and delayed stochastic gradient estimates transmitted from local agents to a central server. The key result is a theoretical proof that diminishing step sizes are sufficient to recover optimal SGD rates, simplifying implementation compared to adaptive schemes.
Forget fancy adaptive schemes: simple diminishing step sizes are provably sufficient for optimal performance in federated learning with delayed gradients.
We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.