Search papers, labs, and topics across Lattice.
1
0
3
11
Decoupling masked reconstruction and contrastive alignment in audio-visual representation learning yields surprisingly large gains in zero-shot retrieval, outperforming SOTA by a significant margin.