Search papers, labs, and topics across Lattice.
2
0
4
1
LLMs can get a free performance boost: decoupling compute and capacity within each layer lets you beat standard transformers at the same FLOPs.
Looping helps transformers think harder on math problems, while memory lets them remember more commonsense facts, and combining both beats simply scaling up layers.