Search papers, labs, and topics across Lattice.
4
0
6
4
Multimodal sentiment analysis suffers from "branch imbalance," where shared representations become redundant and private representations lose discriminative power, but a new rebalancing framework can fix it.
VLMs can be devastatingly fooled by modifying less than 2% of image pixels in a fixed, X-shaped pattern, causing them to fail spectacularly across diverse tasks like classification, captioning, and question answering.
Unlocking superior multimodal sentiment analysis, TSD reveals that disentangling features into common, pairwise, and private subspaces dramatically boosts performance.
By explicitly modeling a multi-level semantic hierarchy and carefully controlling information exchange between modalities, CLCR achieves state-of-the-art results in multimodal learning tasks ranging from emotion recognition to action recognition.