Yuanhan Zhang

S-Lab, SenseTime Research

Papers on Lattice

Total citations

Topics

Research focus

Computer Vision (2)Multimodal Models (2)Architecture Design (Transformers, SSMs, MoE) (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Ziwei Liu (2)Penghao Wu (1)Yuhao Dong (1)Yuwei Niu (1)

Papers (2)

May 27, 2026

May 27, 2026·also People's Public Security University of China, PKU, S-Lab, SenseTime

From Pixels to Words -- Towards Native One-Vision Models at Scale

Ditching modular architectures unlocks surprisingly competitive vision-language performance, proving that end-to-end pixel-to-word models can rival traditional approaches at scale.

Penghao Wu, Yuhao Dong, Yuwei Niu +9

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

May 25, 2026

Xiang An +24May 25, 2026·also ERNIE Team, Monash, S-Lab, SenseTime +1

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.

Xiang An, Yin Xie, Feilong Tang +22

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Search

Yuanhan Zhang

Research focus

Frequent co-authors

Papers (2)