Changlin Li

Achieve real-time video understanding with transparent reasoning: \model{} aligns response timing with visual evidence, offering a breakthrough for online video LLMs.

Kecheng Zhang, Zongxin Yang, Mingfei Han +6

Computer Vision Multimodal Models Tool Use & Agents

Mar 31, 2026

Haihong Hao +7Mar 31, 2026·also Tencent AI

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning

By "dreaming ahead" with learned latent visual dynamics, LatentPilot achieves state-of-the-art vision-and-language navigation, demonstrating the power of future-aware reasoning without needing future observations at test time.

Haihong Hao, Mingfei Han, Changling Li +5

Multimodal Models Robotics & Embodied AI World Models & Planning

Sep 28, 2025

Sep 28, 2025·also CAS, Corresponding authors, ECNU, Fudan +8

HunyuanImage 3.0 Technical Report

The largest open-source image generative model to date, HunyuanImage 3.0, achieves state-of-the-art performance using a Mixture-of-Experts architecture and native Chain-of-Thoughts schema.

Siyu Cao, Hangting Chen, Peng Chen +7132

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Multimodal Models

Search

Changlin Li

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)