Search papers, labs, and topics across Lattice.
Corresponding author
2
0
3
0
Channel-wise adaptive learning rates in Gated Delta Networks unlock superior long-context recall, rivaling softmax attention without the quadratic cost.
By strategically amplifying updates along flat directions in the loss landscape, LITE unlocks faster LLM pre-training with existing matrix-based optimizers like Muon and SOAP.