Search papers, labs, and topics across Lattice.
1
0
3
Forget CLIP: initializing vision encoders from text-only LLMs unlocks surprising gains in visual fidelity and data efficiency for VLMs, rivaling or surpassing larger, contrastively-pretrained models.