Search papers, labs, and topics across Lattice.
Georgia Institute of Technology
2
0
3
ASTRA-sim 3.0 reveals that high-fidelity simulations can significantly optimize distributed ML algorithms and infrastructure design.
Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.