Scuola Internazionale Superiore di Studi Avanzati TriesteMar 2, 2026arXiv:2603.01965

CoVAE: correlated multimodal generative modeling

AI Summary

The paper introduces Correlated Variational Autoencoders (CoVAE), a novel generative architecture designed to preserve the joint statistical structure of multimodal data, addressing limitations of existing multimodal VAEs that rely on latent space fusion. CoVAE aims to improve cross-modal reconstruction and uncertainty quantification by explicitly modeling correlations between modalities. Experiments on real and synthetic datasets demonstrate CoVAE's ability to achieve accurate cross-modal reconstruction and effective uncertainty quantification.

Key Contribution

Multimodal VAEs can now preserve inter-modal correlations, leading to better generative performance and uncertainty estimates.

Abstract

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CoVAE: correlated multimodal generative modeling

Related Papers