Yibo Yan

Papers on Lattice

Total citations

Topics

h-index

Research focus

Multimodal Models (4)Computer Vision (3)Recommendation & Information Retrieval (3)Eval Frameworks & Benchmarks (2)Tool Use & Agents (1)

Frequent co-authors

Xuming Hu (4)Jungang Li (3)Yu Huang (3)Mingdong Ou (3)

Papers (5)

Mar 19, 2026

Mar 19, 2026·also D image-plane projection of the

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.

Yi Shi, Jungang Li, Linghao Zhang +25

Eval Frameworks & Benchmarks Tool Use & Agents

Mar 18, 2026

Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

Video fine-tuning boosts MLLMs' video smarts, but surprisingly dumbs them down on static images – a trade-off you can't simply brute-force away with more frames.

Linghao Zhang, Jungang Li, Yonghua Hei +12

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 2, 2026

Mar 2, 2026·also Chemical and Biomolecular Engineering

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

Shrinking visual document retrieval storage by 95% is now possible without sacrificing accuracy, thanks to a layout-aware parsing strategy.

Yibo Yan, Mingdong Ou, Mingdong Ou +7

Computer Vision Multimodal Models Recommendation & Information Retrieval

Feb 23, 2026

Feb 23, 2026·also Chemical and Biomolecular Engineering

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Multi-vector visual document retrieval gets a speed boost without sacrificing accuracy thanks to a novel "Prune-then-Merge" approach that intelligently compresses visual features.

Yibo Yan, Mingdong Ou, Mingdong Ou +7

Inference & Quantization Multimodal Models Recommendation & Information Retrieval

Feb 23, 2026·also Chemical and Biomolecular Engineering

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.

Yibo Yan, Jiahao Huo, Guanbo Feng +14

Computer Vision Multimodal Models Recommendation & Information Retrieval

Search

Yibo Yan

Research focus

Frequent co-authors

Papers (5)