Search papers, labs, and topics across Lattice.
2
0
4
0
Offloading communication to SmartNIC DPUs can speed up host-dominated workloads by 1.55x, but the lack of Direct Cache Access creates a massive DRAM bottleneck.
Static GPU partitioning alone can't solve underutilization, but fine-grained CPU offloading over Nvlink-C2C can bridge the gap.