Search papers, labs, and topics across Lattice.
1
0
3
LMMs struggle to ground text queries in the right parts of images, but explicitly modeling salient visual subjects can dramatically improve cross-modal retrieval.