Search papers, labs, and topics across Lattice.
2
0
4
0
Stop forcing your multimodal encoders to inherit suboptimal LLM parallelism strategies: heterogeneous parallelism unlocks up to 49% higher TFLOPS/GPU.
Training trillion-parameter Mixture-of-Experts models just got a whole lot faster: Megatron Core now achieves >1 PFLOP/GPU on NVIDIA's latest hardware.