Search papers, labs, and topics across Lattice.
1
0
3
Training Llama3-8B with 5M context on a single node is now possible, thanks to a simple head-wise chunking strategy that slashes memory use by 87.5%.