Search papers, labs, and topics across Lattice.
2
99
5
14
Vision-language models struggle to adapt plans based on visual input alone, revealing a critical gap in their ability to use what they see when things don't go as expected.
Forget task-specific models: Magma, a single foundation model, now outperforms them in both UI navigation and robotic manipulation by bridging verbal and action abilities.