Search papers, labs, and topics across Lattice.
University of Wisconsin-Madison
2
0
4
Achieving up to 1.44X speedup in GPU performance by radically rethinking task scheduling and resource utilization in CUDA pipelines.
LLMs can run up to 35% faster on chiplet architectures thanks to a new lossless exponent compression technique that slashes inter-chiplet communication overhead.