Mar 19, 2026arXiv:2603.19067

Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus

Mohamed Badi, Chaouki Ben Issaid, Mehdi Bennis

AI Summary

This paper introduces CoMFed, a federated learning framework for multi-modal data that learns compressed latent representations using projection matrices to align feature spaces across heterogeneous clients. A latent-space regularizer is used to improve cross-modal consistency and robustness to outliers in this setting. Experiments on human activity recognition demonstrate CoMFed achieves competitive accuracy with reduced communication overhead compared to existing methods.

Key Contribution

Multi-modal federated learning can be made communication-efficient and robust to outliers by learning a shared latent space, even with heterogeneous client architectures.

Abstract

Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, but applying FL to multi-modal settings introduces significant challenges. Clients typically possess heterogeneous modalities and model architectures, making it difficult to align feature spaces efficiently while preserving privacy and minimizing communication costs. To address this, we introduce CoMFed, a Communication-Efficient Multi-Modal Federated Learning framework that uses learnable projection matrices to generate compressed latent representations. A latent-space regularizer aligns these representations across clients, improving cross-modal consistency and robustness to outliers. Experiments on human activity recognition benchmarks show that CoMFed achieves competitive accuracy with minimal overhead.

Distributed Systems & Hardware Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References11

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus

Related Papers