Search papers, labs, and topics across Lattice.
2
0
4
3
Ditching the strict unit-sum constraint in softmax attention with a simple affine scaling trick unlocks more stable training and better downstream performance for Transformers.
Squeezing 4.5x lower latency and 3.9x higher throughput from multi-LLM systems, PrefillShare lets you share the KV cache across models, slashing redundancy without sacrificing accuracy.