Kaicheng Yang

Uniformly quantizing the entire diffusion action head of VLAs to W4A4 is not only possible, but can match or exceed FP16 performance, defying conventional wisdom and slashing memory footprint by 71%.

Mingze Li, Sicheng Lyu, Dongxiu Liu +4

Inference & Quantization Multimodal Models Robotics & Embodied AI

May 25, 2026

Xiang An +283w ago·also ERNIE Team, Monash, NTU, S-Lab +1

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.

Xiang An, Yin Xie, Feilong Tang +26

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Apr 16, 2026

Shuo Tan +7Apr 16, 2026

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Forget generic retrieval signals – UniDoc-RL uses reinforcement learning to teach LVLMs how to actively perceive and reason about visual information, yielding a 17.7% performance boost.

Shuo Tan, Zelong Sun, Tiancheng Gu +5

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Search

Kaicheng Yang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)