Search papers, labs, and topics across Lattice.
The Hong Kong Polytechnic University
2
0
4
MLLMs are failing to recognize and effectively utilize physical tools, with top models achieving only 21% task completion in real-world scenarios.
Images can serve as a powerful standalone medium for reasoning, achieving nearly double the token efficiency of traditional text methods.