Search papers, labs, and topics across Lattice.
Southwest Jiaotong University
1
0
4
MLLMs have revolutionized VQA, but still struggle with visual grounding and balanced multimodal fusion, hindering their reliability.