Search papers, labs, and topics across Lattice.
2
0
5
14
Current MLLMs are surprisingly bad at understanding human intent in egocentric videos at a step-by-step level, achieving only 33% accuracy on a new benchmark designed to prevent future-frame leakage.
The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.