Search papers, labs, and topics across Lattice.
Mohamed bin Zayed University of Artificial Intelligence;University of Technology Sydney,
1
0
3
13
Even the best vision-language models struggle to reliably set fine-grained GUI states, achieving only 33% accuracy on a new benchmark, but targeted visual hints suggest a clear path to improvement.