Search papers, labs, and topics across Lattice.
The paper introduces Correlated Variational Autoencoders (CoVAE), a novel generative architecture designed to preserve the joint statistical structure of multimodal data, addressing limitations of existing multimodal VAEs that rely on latent space fusion. CoVAE aims to improve cross-modal reconstruction and uncertainty quantification by explicitly modeling correlations between modalities. Experiments on real and synthetic datasets demonstrate CoVAE's ability to achieve accurate cross-modal reconstruction and effective uncertainty quantification.
Multimodal VAEs can now preserve inter-modal correlations, leading to better generative performance and uncertainty estimates.
Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.