Search papers, labs, and topics across Lattice.
2
0
4
A 3B model can match the performance of models more than twice its size in mobile GUI automation by distilling visual history into concise natural language summaries.
Current VLM-driven embodied agents struggle with fundamental skills like navigation and object manipulation when evaluated in realistic, low-level action spaces, severely hindering their performance on complex tasks.