Yuanhuiyi Lyu

LVLMs can achieve SOTA visual reasoning by learning to "see" in a way that optimizes for reasoning, even if it means deviating from strict geometric accuracy.

Yangfu Li, Yuning Gong, Hongjian Zhan +4

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Apr 8, 2026

Apr 8, 2026·also HKUST

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

Doubling the number of tokens in a ViT-based autoencoder, combined with staged compression and self-supervised pretraining, dramatically improves generative performance under deep compression, without increasing the latent budget.

Ziyuan Huang, Yangfu Li, Yuanhuiyi Lyu +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Mar 12, 2026

Ye Pan +8Mar 12, 2026·also HKUST

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next

Current MLLMs are surprisingly bad at understanding human intent in egocentric videos at a step-by-step level, achieving only 33% accuracy on a new benchmark designed to prevent future-frame leakage.

Ye Pan, Chifai Wong, Chi Kit Wong +6

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Feb 23, 2026

Feb 23, 2026·also Chemical and Biomolecular Engineering

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.

Yibo Yan, Jiahao Huo, Guanbo Feng +14

Computer Vision Multimodal Models Recommendation & Information Retrieval

Search

Yuanhuiyi Lyu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)