Search papers, labs, and topics across Lattice.
NVIDIA
1
0
3
You can slash LLM inference costs without sacrificing quality by strategically pruning experts, quantizing, and swapping full attention for windowed attention, as demonstrated on gpt-oss-120B.