Search papers, labs, and topics across Lattice.
University of California, Santa Cruz
3
0
4
4
Kernel launch overhead is a bigger bottleneck than you think: GPUOS achieves up to 15.3x speedup by fusing operations at runtime.
Diagnose performance bottlenecks in large-scale AI training 100x faster with a new observability system that adds almost no overhead.
Hot-patching NCCL with eBPF lets you boost AllReduce throughput by 27% *and* verify plugin safety, all without modifying NCCL itself.