Search papers, labs, and topics across Lattice.
3
0
6
Multimodal models stumble badly on low-resource Southeast Asian languages, as revealed by the new SEA-Vision benchmark for document and scene text understanding.
Ditch imperfect human annotations: this dual-reward RL approach trains image captioning models to be both more complete and more factually correct.
By grounding reasoning within the topology of a global interaction graph, ManCAR achieves up to 46.88% relative improvement in NDCG@10 compared to state-of-the-art sequential recommendation baselines.