Search papers, labs, and topics across Lattice.
2
0
6
2
VLMs are surprisingly bad at visually matching objects unless they can name them, revealing a critical reliance on textual anchors that overshadows their visual processing capabilities.
VLMs can regain lost linguistic prowess without extra parameters or architectural changes, thanks to a clever KV-cache sharing trick for distillation.