Search papers, labs, and topics across Lattice.
This paper introduces Byz-DM21, a Byzantine-robust distributed learning algorithm that uses a double-momentum mechanism and error feedback to improve communication efficiency without requiring large batch sizes. Byz-DM21 achieves $\mathcal{O}(\varepsilon^{-4})$ convergence to $\varepsilon$-stationary points, and a variance-reduced variant, Byz-VR-DM21, further improves this to $\mathcal{O}(\varepsilon^{-3 })$. The algorithms' effectiveness is validated through numerical experiments, also extending the analysis to functions satisfying the Polyak-艁ojasiewicz condition.
Achieve faster, Byzantine-robust distributed learning by combining double momentum with variance reduction, eliminating the need for large batch sizes.
In collaborative and distributed learning, Byzantine robustness reflects a major facet of optimization algorithms. Such distributed algorithms are often accompanied by transmitting a large number of parameters, so communication compression is essential for an effective solution. In this paper, we propose Byz-DM21, a novel Byzantine-robust and communication-efficient stochastic distributed learning algorithm. Our key innovation is a novel gradient estimator based on a double-momentum mechanism, integrating recent advancements in error feedback techniques. Using this estimator, we design both standard and accelerated algorithms that eliminate the need for large batch sizes while maintaining robustness against Byzantine workers. We prove that the Byz-DM21 algorithm has a smaller neighborhood size and converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-4})$ iterations. To further enhance efficiency, we introduce a distributed variant called Byz-VR-DM21, which incorporates local variance reduction at each node to progressively eliminate variance from random approximations. We show that Byz-VR-DM21 provably converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-3 })$ iterations. Additionally, we extend our results to the case where the functions satisfy the Polyak-艁ojasiewicz condition. Finally, numerical experiments demonstrate the effectiveness of the proposed method.