Search papers, labs, and topics across Lattice.
5
0
8
10
Multimodal models can now handle audio natively with improved efficiency, achieving state-of-the-art results in complex tasks like document understanding and agentic computer use.
Ditch the slow lane: $R^2$-dLLM turbocharges diffusion language models by slashing decoding steps by up to 75% without sacrificing quality.
Nemotron 3 Super proves you can achieve comparable accuracy to existing 120B models, but with significantly higher inference throughput, by combining Mamba, Attention, and Mixture-of-Experts.
Swap out slow, one-token-at-a-time generation in VLMs for a 6x speed boost, without sacrificing quality, using a surprisingly simple direct conversion to block-diffusion decoding.
You can slash LLM inference costs without sacrificing quality by strategically pruning experts, quantizing, and swapping full attention for windowed attention, as demonstrated on gpt-oss-120B.