Search papers, labs, and topics across Lattice.
1
0
3
Stop hand-writing CUDA kernels: CUCo's agent-driven approach co-optimizes computation and communication, slashing LLM training/inference latency by up to 1.57x.