Search papers, labs, and topics across Lattice.
Snowflake
2
0
4
Optimizing for runtime in multimodal training can be energy-inefficient, as data movement and overlap on Grace Hopper chips dominate energy consumption, not raw compute.
Training LLMs on ultra-long contexts just got a whole lot easier: AutoSP automates sequence parallelism and activation checkpointing, boosting context length by up to 2.7x with negligible throughput cost.