Xuming Hu

Papers on Lattice

Total citations

Topics

h-index

Research focus

Multimodal Models (5)Recommendation & Information Retrieval (3)Eval Frameworks & Benchmarks (2)Computer Vision (2)Tool Use & Agents (1)

Frequent co-authors

Yibo Yan (4)Jiahao Huo (4)Yu Huang (3)Mingdong Ou (3)

Papers (6)

Mar 19, 2026

Mar 19, 2026·also D image-plane projection of the

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.

Yi Shi, Jungang Li, Linghao Zhang +25

Eval Frameworks & Benchmarks Tool Use & Agents

Mar 12, 2026

Ye Pan +8Mar 12, 2026·also HKUST

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next

Current MLLMs are surprisingly bad at understanding human intent in egocentric videos at a step-by-step level, achieving only 33% accuracy on a new benchmark designed to prevent future-frame leakage.

Ye Pan, Chi Kit Wong, Chifai Wong +6

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Haiying Xu +5Mar 12, 2026

LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning

Ditch the pixel-level rendering and external executors: LatentGeo learns continuous latent visual representations to internalize auxiliary geometric constructions for multimodal geometric reasoning, boosting performance on complex geometry problems.

Haiying Xu, Zihan Wang, Song Dai +3

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Reasoning & Chain-of-Thought

Mar 2, 2026

Mar 2, 2026·also Chemical and Biomolecular Engineering

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

Shrinking visual document retrieval storage by 95% is now possible without sacrificing accuracy, thanks to a layout-aware parsing strategy.

Yibo Yan, Mingdong Ou, Mingdong Ou +7

Computer Vision Multimodal Models Recommendation & Information Retrieval

Feb 23, 2026

Feb 23, 2026·also Chemical and Biomolecular Engineering

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Multi-vector visual document retrieval gets a speed boost without sacrificing accuracy thanks to a novel "Prune-then-Merge" approach that intelligently compresses visual features.

Yibo Yan, Mingdong Ou, Mingdong Ou +7

Inference & Quantization Multimodal Models Recommendation & Information Retrieval

Feb 23, 2026·also Chemical and Biomolecular Engineering

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.

Yibo Yan, Jiahao Huo, Guanbo Feng +14

Computer Vision Multimodal Models Recommendation & Information Retrieval

Search

Xuming Hu

Research focus

Frequent co-authors

Papers (6)