Search papers, labs, and topics across Lattice.
Mohamed bin Zayed University of Artificial Intelligence
1
0
3
3
Forget choosing just one vision encoder – fusing CLIP and DINO representations unlocks a significant performance boost in vision-language tasks.