Search papers, labs, and topics across Lattice.
This paper introduces CoMFed, a federated learning framework for multi-modal data that learns compressed latent representations using projection matrices to align feature spaces across heterogeneous clients. A latent-space regularizer is used to improve cross-modal consistency and robustness to outliers in this setting. Experiments on human activity recognition demonstrate CoMFed achieves competitive accuracy with reduced communication overhead compared to existing methods.
Multi-modal federated learning can be made communication-efficient and robust to outliers by learning a shared latent space, even with heterogeneous client architectures.
Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, but applying FL to multi-modal settings introduces significant challenges. Clients typically possess heterogeneous modalities and model architectures, making it difficult to align feature spaces efficiently while preserving privacy and minimizing communication costs. To address this, we introduce CoMFed, a Communication-Efficient Multi-Modal Federated Learning framework that uses learnable projection matrices to generate compressed latent representations. A latent-space regularizer aligns these representations across clients, improving cross-modal consistency and robustness to outliers. Experiments on human activity recognition benchmarks show that CoMFed achieves competitive accuracy with minimal overhead.