Search papers, labs, and topics across Lattice.
Hong Kong University of Science and Technology (Guangzhou), Hong Kong University of Science and Technology
2
0
5
VLMs are surprisingly bad at 3D spatial reasoning in panoramic images, but a new RL-based training method closes the gap.
The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.