MBZUAIFeb 26, 2026arXiv:2602.22917

Towards Multimodal Domain Generalization with Few Labels

Hongzhao Li, Hongzhao Li, Hualei Wan, Hualei Wan, Shupan Li, Shupan Li, Mingliang Xu, Mingliang Xu, Muhammad Haris Khan, Muhammad Haris Khan

AI Summary

This paper introduces Semi-Supervised Multimodal Domain Generalization (SSMDG), a new problem setting for learning robust multimodal models from multi-source data with limited labeled samples. To address this, they propose a unified framework incorporating Consensus-Driven Consistency Regularization for reliable pseudo-labeling, Disagreement-Aware Regularization for handling ambiguous samples, and Cross-Modal Prototype Alignment for domain- and modality-invariant representations. Experiments on newly established SSMDG benchmarks demonstrate that the proposed method outperforms existing approaches in both standard and missing-modality scenarios.

Key Contribution

Achieve robust multimodal generalization with few labels by exploiting both consensus and disagreement among modalities, even when some modalities are missing.

Abstract

Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively: multimodal domain generalization methods cannot exploit unlabeled data, semi-supervised multimodal learning methods ignore domain shifts, and semi-supervised domain generalization methods are confined to single-modality inputs. To overcome these limitations, we propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, which obtains reliable pseudo-labels through confident fused-unimodal consensus; Disagreement-Aware Regularization, which effectively utilizes ambiguous non-consensus samples; and Cross-Modal Prototype Alignment, which enforces domain- and modality-invariant representations while promoting robustness under missing modalities via cross-modal translation. We further establish the first SSMDG benchmarks, on which our method consistently outperforms strong baselines in both standard and missing-modality scenarios. Our benchmarks and code are available at https://github.com/lihongzhao99/SSMDG.

Data Curation & Synthetic Data Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References47

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Multimodal Domain Generalization with Few Labels

Related Papers