Search papers, labs, and topics across Lattice.
2
0
5
Forget auxiliary encoders and handcrafted losses: LVRPO uses reinforcement learning to directly align language and vision, boosting performance across a range of multimodal tasks.
Stop treating generated images like real ones: GMAIL aligns them as separate modalities in a shared latent space, unlocking significant gains in vision-language tasks.