Search papers, labs, and topics across Lattice.
MBZUAI
3
0
7
4
Multimodal models can be tweaked to be more culturally relevant to specific regions, boosting performance by 5-15% without sacrificing global generalization.
VLMs struggle with basic counting not because they can't "see" the objects, but because they forget to look when generating the answer.
VLMs can regain lost linguistic prowess without extra parameters or architectural changes, thanks to a clever KV-cache sharing trick for distillation.