Search papers, labs, and topics across Lattice.
4
0
9
19
Forget scaling compute – the future of AI hinges on a 1000x leap in energy efficiency via tight AI+Hardware co-design over the next decade.
Pre-normalization in Transformers is the culprit behind the mysterious link between massive activation outliers and attention sinks, but decoupling them reveals their distinct functions: global parameterization vs. local attention modulation.
Vision models are far more data-hungry than language models, but Mixture-of-Experts can harmonize this asymmetry for truly unified multimodal models.
LLMs can achieve the same accuracy with 16x less data by constraining their hidden-state trajectories to follow geodesics on a semantic manifold.