Weiming Hu

State Key Laboratory of Multimodal Artificial Intelligence Systems, University of Chinese Academy of Sciences, ShanghaiTech University

Papers on Lattice

Total citations

Topics

Research focus

Multimodal Models (3)Computer Vision (2)Robotics & Embodied AI (1)Training Efficiency & Optimization (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Bing Li (2)Xuesong Chen (1)Jin Gao (1)Fudong Ge (1)

Papers (4)

Apr 20, 2026

Voyager ResearchApr 20, 2026·also CAS, Hello Inc, ShanghaiTech, SJTU +1

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

Ditch the fragmented architectures: OneDrive unifies autonomous driving tasks within a single VLM decoder, achieving state-of-the-art performance while slashing latency.

Xuesong Chen, Jin Gao, Fudong Ge +2

Multimodal Models Robotics & Embodied AI

Apr 14, 2026

Apr 14, 2026·also Beihang, CAS, Hebei Key Laboratory of Computer Virtual, ShanghaiTech +2

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

Multimodal trackers can achieve state-of-the-art results without ballooning parameter counts by adaptively aligning cross-modal attention maps and using hierarchical mixture of experts for efficient global reasoning.

Junbin Su, Ziteng Xue, Shihui Zhang +2

Computer Vision Multimodal Models Training Efficiency & Optimization

Guanyi Qin +45Apr 14, 2026·also Meta AI, CAS, Memories.ai Research, NTU +5

NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)

Current image quality metrics struggle to articulate *why* one high-quality image is better than another, but this challenge shows MLLMs are closing the gap by providing expert-level explanations.

Guanyi Qin, Jie Liang, Bingbing Zhang +43

Computer Vision Eval Frameworks & Benchmarks

Apr 8, 2026

UWApr 8, 2026·also CAS, Hellogroup, Jilin, ShanghaiTech +1

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

MLLMs can be tricked into missing 90% of harmful content simply by encoding it in images that humans can easily read.

Zhiheng Li, Zongyang Ma, Yuntong Pan +6

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

Search

Weiming Hu

Research focus

Frequent co-authors

Papers (4)