Search papers, labs, and topics across Lattice.
2
0
5
12
LLM-based GPU kernel generators ace synthetic benchmarks, but choke when faced with real-world production constraints, achieving at best 0.94x speedup.
Unlock 2x faster LLM serving and slash warmup times by fusing kernels that gracefully handle dynamic shapes and data dependencies.