Search papers, labs, and topics across Lattice.
2
0
5
BinaryAttention proves you can more than halve the runtime of attention in vision and diffusion transformers without sacrificing accuracy, simply by using the sign of queries and keys.
Train a competitive 2B MoE LLM on 16 commodity GPUs connected via the internet, proving you don't need a datacenter to play the LLM game.