Search papers, labs, and topics across Lattice.
1
0
3
2
Forget choosing just one vision encoder – fusing CLIP and DINO representations unlocks a significant performance boost in vision-language tasks.