Search papers, labs, and topics across Lattice.
6
0
9
14
GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.
Current MLLMs are surprisingly bad at understanding human intent in egocentric videos at a step-by-step level, achieving only 33% accuracy on a new benchmark designed to prevent future-frame leakage.
Ditch the pixel-level rendering and external executors: LatentGeo learns continuous latent visual representations to internalize auxiliary geometric constructions for multimodal geometric reasoning, boosting performance on complex geometry problems.
Shrinking visual document retrieval storage by 95% is now possible without sacrificing accuracy, thanks to a layout-aware parsing strategy.
Multi-vector visual document retrieval gets a speed boost without sacrificing accuracy thanks to a novel "Prune-then-Merge" approach that intelligently compresses visual features.
The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.