Search papers, labs, and topics across Lattice.
1
3
2
2
CLIP's image tokens struggle to aggregate information from spatially or semantically related regions, but DeCLIP fixes this by decoupling self-attention and distilling knowledge from VFMs and diffusion models.