Search papers, labs, and topics across Lattice.
4
0
6
LLMs train 1.5x faster and generalize better with a surprisingly simple trick: adapt learning rates per-layer based on the "heavy-tailedness" of their weight matrices.
Autoregressive video generation gets a 6x speed boost without sacrificing quality, thanks to a motion-aware caching strategy that finally respects the fact that not all pixels are created equal.
Sparsity, often viewed as a means for efficiency, actually unlocks deeper, more effective LLMs by taming variance and boosting layer utilization.
DLMs aren't truly parallel because their training data is too sequential, but NAP shows how data curation can unlock genuine parallel decoding and boost reasoning performance.