Search papers, labs, and topics across Lattice.
May
2
0
4
3
Multimodal agents can now continually improve their tool use and orchestration in open-ended settings without parameter updates, thanks to a novel dual-stream framework that learns from both past experiences and structured skills.
Even the best multimodal agents struggle with realistic visual scenarios, achieving only 27% accuracy on the new AgentVista benchmark that demands long-horizon tool use across web search, image search, and code.