Fanhu Zeng

Continual learning methods for Video-LLMs face a fundamental trade-off: mitigating catastrophic forgetting often comes at the cost of generalization or prohibitive computational overhead.

Haiyang Guo, Yichen Shi, Fei Zhu +6

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 18, 2026

Mar 18, 2026·also Beihang, Beijing National Research Center for Information

Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients

Quantizing large vision-language models just got a whole lot better: a new token-level sensitivity metric closes the accuracy gap with full-precision models by up to 1.6% in 3-bit weight-only quantization.

Ziwei Xiang, Fanhu Zeng, Hongjian Fang +5

Computer Vision Inference & Quantization Multimodal Models

Feb 26, 2026

Feb 26, 2026·also Tsinghua AI, Tencent AI

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Latent visual reasoning in multimodal LLMs is largely ineffective, as the "imagination" happening in latent space doesn't actually attend to the input or influence the output, making explicit text-based imagination a surprisingly better alternative.

You Li, You Li, Chi Chen +9

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Search

Fanhu Zeng

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)