Search papers, labs, and topics across Lattice.
1
0
3
Apple's own vDSP FFT library gets smoked by a new implementation that's 29% faster, thanks to a clever two-tier memory model exploiting the GPU's register file and threadgroup memory.