Search papers, labs, and topics across Lattice.
Xinjiang University
1
0
3
9
Current MLLM benchmarks are missing the forest for the trees: Agentic-MME reveals that strong final-answer accuracy masks surprisingly poor tool use and planning in complex multimodal tasks.