Mar 16, 2026arXiv:2603.15144

Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

AI Summary

This paper introduces Byz-DM21, a Byzantine-robust distributed learning algorithm that uses a double-momentum mechanism and error feedback to improve communication efficiency without requiring large batch sizes. Byz-DM21 achieves $\mathcal{O}(\varepsilon^{-4})$ convergence to $\varepsilon$-stationary points, and a variance-reduced variant, Byz-VR-DM21, further improves this to $\mathcal{O}(\varepsilon^{-3 })$. The algorithms' effectiveness is validated through numerical experiments, also extending the analysis to functions satisfying the Polyak-Łojasiewicz condition.

Key Contribution

Achieve faster, Byzantine-robust distributed learning by combining double momentum with variance reduction, eliminating the need for large batch sizes.

Abstract

In collaborative and distributed learning, Byzantine robustness reflects a major facet of optimization algorithms. Such distributed algorithms are often accompanied by transmitting a large number of parameters, so communication compression is essential for an effective solution. In this paper, we propose Byz-DM21, a novel Byzantine-robust and communication-efficient stochastic distributed learning algorithm. Our key innovation is a novel gradient estimator based on a double-momentum mechanism, integrating recent advancements in error feedback techniques. Using this estimator, we design both standard and accelerated algorithms that eliminate the need for large batch sizes while maintaining robustness against Byzantine workers. We prove that the Byz-DM21 algorithm has a smaller neighborhood size and converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-4})$ iterations. To further enhance efficiency, we introduce a distributed variant called Byz-VR-DM21, which incorporates local variance reduction at each node to progressively eliminate variance from random approximations. We show that Byz-VR-DM21 provably converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-3 })$ iterations. Additionally, we extend our results to the case where the functions satisfy the Polyak-Łojasiewicz condition. Finally, numerical experiments demonstrate the effectiveness of the proposed method.

Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Related Papers