Search papers, labs, and topics across Lattice.
The paper addresses challenges in decentralized federated learning (DFL) for LLMs, specifically catastrophic forgetting, inefficient communication, and knowledge interference arising from heterogeneous multi-task datasets. To mitigate these issues, the authors introduce a sparse-and-orthogonal LoRA method to ensure orthogonal model updates, a cluster-based topology design for aggregation, and an implicit Mixture of Experts (MoE) mechanism for inference. Experiments demonstrate a 73% reduction in communication resource consumption and a 5% performance improvement compared to standard LoRA.
Orthogonal parameter updates and clustered aggregation can slash communication costs by 73% while boosting performance in federated LLM fine-tuning.
Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via wireless connections for knowledge integration.However, directly aggregating parameters fine-tuned on heterogeneous datasets induces three primary issues across the DFL life-cycle: (i) \textit{catastrophic knowledge forgetting during fine-tuning process}, arising from conflicting update directions caused by data heterogeneity; (ii) \textit{inefficient communication and convergence during model aggregation process}, due to bandwidth-intensive redundant model transmissions; and (iii) \textit{multi-task knowledge interference during inference process}, resulting from incompatible knowledge representations coexistence during inference. To address these issues in a fully decentralized scenario, we first propose a sparse-and-orthogonal LoRA that ensures orthogonality between model updates to eliminate direction conflicts during fine-tuning.Then, we analyze how device connection topology affects multi-task performance, prompting a cluster-based topology design during aggregation.Finally, we propose an implicit mixture of experts (MoE) mechanism to avoid the coexistence of incompatible knowledge during inference. Simulation results demonstrate that the proposed approach effectively reduces communication resource consumption by up to $73\%$ and enhances average performance by $5\%$ compared with the traditional LoRA method.