Search papers, labs, and topics across Lattice.
2
0
6
2
Current MLLM benchmarks are missing the forest for the trees: Agentic-MME reveals that strong final-answer accuracy masks surprisingly poor tool use and planning in complex multimodal tasks.
LLMs, like humans, exhibit a "frequency bias," performing better when prompted and fine-tuned with more common textual expressions.