Tianhua Han

Papers on Lattice

Total citations

Topics

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Yintao He (1)Lian Liu (1)Zhirong Chen (1)Mengdi Wang (1)

Papers (1)

Mar 1, 2026

TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading

A novel GPU-CPU-NDP architecture, TriMoE, unlocks 2.83x faster MoE inference by intelligently routing "hot," "warm," and "cold" experts to the compute unit where they thrive.

Yintao He, Tianhua Han, Lian Liu +4

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Tianhua Han

Research focus

Frequent co-authors

Papers (1)