Search papers, labs, and topics across Lattice.
1
0
3
Multilingual vision-language models can achieve surprisingly strong performance (36% on MMMU) simply by training on translated data and aligning with parallel text corpora.