Search papers, labs, and topics across Lattice.
SKLP, Institute of Computing Technology, Chinese Academy of Sciences, State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
3
0
3
8
Wafer-scale SRAM CIM can deliver up to 17x better energy efficiency for LLM inference by eliminating off-chip data movement and using token-grained pipelining.
A novel GPU-CPU-NDP architecture, TriMoE, unlocks 2.83x faster MoE inference by intelligently routing "hot," "warm," and "cold" experts to the compute unit where they thrive.
LLM serving gets a boost from PAM, a hierarchical memory architecture that intelligently distributes and processes key-value pairs across heterogeneous PIM devices, slashing memory bottlenecks.