Search papers, labs, and topics across Lattice.
Microsoft, Beijing, China
1
2
3
Achieve near-optimal DLRM inference speedups across diverse hardware (NVIDIA, AMD, TPU) with a single optimization pass, eliminating the need for vendor-specific tuning.