Mengdi Wang

SKLP, Institute of Computing Technology, Chinese Academy of Sciences, State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (3)Distributed Systems & Hardware (3)Inference & Quantization (3)

Frequent co-authors

Yinhe Han (2)Yiqi Liu (1)Yudong Pan (1)Shixin Zhao (1)

Papers (3)

Mar 3, 2026

1w ago·also Georgia Tech, PolyU

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference

Wafer-scale SRAM CIM can deliver up to 17x better energy efficiency for LLM inference by eliminating off-chip data movement and using token-grained pipelining.

Yiqi Liu, Yudong Pan, Mengdi Wang +6

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Mar 1, 2026

2w ago

TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading

A novel GPU-CPU-NDP architecture, TriMoE, unlocks 2.83x faster MoE inference by intelligently routing "hot," "warm," and "cold" experts to the compute unit where they thrive.

Yintao He, Tianhua Han, Lian Liu +4

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Feb 12, 2026

PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System

LLM serving gets a boost from PAM, a hierarchical memory architecture that intelligently distributes and processes key-value pairs across heterogeneous PIM devices, slashing memory bottlenecks.

Yutian Zhou, Mengdi Wang, Ying Wang

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Mengdi Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)