Ming-Hsuan Yang

MLLMs can revolutionize video understanding by integrating watching, remembering, and reasoning into a cohesive framework that addresses long-range dependencies and sparse evidence.

Jiahao Meng, Kuan Gao, Weisong Liu +8

Computer Vision Multimodal Models

Insta360 Research5d ago·also Beihang, Jilin, SJTU, University of California at Merced +1

UniSHARP: Universal Sharp Monocular View Synthesis

UniSHARP achieves unprecedented photorealistic view synthesis across a continuum of camera systems, outperforming traditional methods by a substantial margin.

Meixi Song, Dizhe Zhang, Hao Ren +4

Computer Vision Multimodal Models

May 21, 2026

2w ago·also Hainan University, Jilin, Pengcheng Laboratory, SJTU +1

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Treating geometry as a fundamental representational prerequisite, rather than a late-fusion auxiliary signal, significantly boosts spatio-temporal reasoning in vision-language models.

Deshui Miao, Xingsen Huang, Yameng Gu +2

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

University of California2w ago·also Adobe Research, Jilin, SJTU, University of California at Merced

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Motion-controlled video generation can now produce more plausible and natural results by reasoning about motion and its consequences, rather than rigidly following user-defined trajectories.

Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei +2

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Search

Ming-Hsuan Yang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)