Search papers, labs, and topics across Lattice.
4
0
6
6
Personalizing mobile GUI agents for privacy requires navigating structurally different execution trajectories, and TIPO offers a way to do it.
GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.
Video fine-tuning boosts MLLMs' video smarts, but surprisingly dumbs them down on static images – a trade-off you can't simply brute-force away with more frames.
The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.