Search papers, labs, and topics across Lattice.
This paper introduces BLOSSOM, a novel federated learning framework designed to handle the challenges of shared and sparsely observed modalities in multimodal data. BLOSSOM employs a block-wise aggregation strategy, selectively aggregating shared model components while keeping task-specific blocks private, enabling partial personalization to address client and task heterogeneity. Experiments on diverse multimodal datasets demonstrate that BLOSSOM significantly outperforms full-model aggregation, achieving performance gains of 18.7% in modality-incomplete and 37.7% in modality-exclusive scenarios.
Multimodal federated learning can finally handle the messy reality of missing data with BLOSSOM's block-wise personalization, boosting performance by up to 37.7% compared to naive aggregation.
Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.