Search papers, labs, and topics across Lattice.
4
0
9
7
Nemotron 3 Super proves you can achieve comparable accuracy to existing 120B models, but with significantly higher inference throughput, by combining Mamba, Attention, and Mixture-of-Experts.
You can slash LLM inference costs without sacrificing quality by strategically pruning experts, quantizing, and swapping full attention for windowed attention, as demonstrated on gpt-oss-120B.
LLMs can significantly boost factual accuracy in long-form generation by strategically "toning down" uncertain details, rather than simply omitting them.
Flipping just *two* sign bits in a large neural network can obliterate its performance, revealing a surprising fragility in even state-of-the-art models.