Weihang Chen

LLMs can be sped up by over 2x without sacrificing accuracy, by compressing the input and predicting multiple output tokens at once using a unified framework.

Wenhui Tan, Xiaoqian Ma, Siqi Fan +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Reasoning & Chain-of-Thought

Apr 13, 2026

Apr 13, 2026·also HKU, NTU, USTC

Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Omni-modal RL post-training just got a whole lot faster: Relax delivers up to 2x speedups over existing systems, even for massive MoE models, without sacrificing reward convergence.

Liujie Zhang, Benzhe Ning, Xiaoyan Yu +3

Multimodal Models RLHF & Preference Learning Tool Use & Agents

Search

Weihang Chen

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)