Search papers, labs, and topics across Lattice.
1
0
3
A novel GPU-CPU-NDP architecture, TriMoE, unlocks 2.83x faster MoE inference by intelligently routing "hot," "warm," and "cold" experts to the compute unit where they thrive.