Search papers, labs, and topics across Lattice.
Google
1
0
3
Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.