Search papers, labs, and topics across Lattice.
Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
1
0
3
A novel GPU-CPU-NDP architecture, TriMoE, unlocks 2.83x faster MoE inference by intelligently routing "hot," "warm," and "cold" experts to the compute unit where they thrive.