Search papers, labs, and topics across Lattice.
2
0
3
1
Forget choosing just one vision encoder – fusing CLIP and DINO representations unlocks a significant performance boost in vision-language tasks.
Achieve state-of-the-art video polyp segmentation by adaptively selecting informative reference frames and aggregating multi-scale historical features with causal attention.