Search papers, labs, and topics across Lattice.
5
0
6
18
GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.
Video fine-tuning boosts MLLMs' video smarts, but surprisingly dumbs them down on static images – a trade-off you can't simply brute-force away with more frames.
Shrinking visual document retrieval storage by 95% is now possible without sacrificing accuracy, thanks to a layout-aware parsing strategy.
Multi-vector visual document retrieval gets a speed boost without sacrificing accuracy thanks to a novel "Prune-then-Merge" approach that intelligently compresses visual features.
The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.