Search papers, labs, and topics across Lattice.
Technion
2
0
6
7
You can slash LLM inference costs without sacrificing quality by strategically pruning experts, quantizing, and swapping full attention for windowed attention, as demonstrated on gpt-oss-120B.
LLMs can significantly boost factual accuracy in long-form generation by strategically "toning down" uncertain details, rather than simply omitting them.