Search papers, labs, and topics across Lattice.
1
0
3
2
Today's best multimodal models can only solve half of compositional visual tool-use tasks, revealing a critical gap in their ability to plan and execute complex, multi-step visual reasoning.