Search papers, labs, and topics across Lattice.
2
0
4
0
K-means gets a 17.9x speed boost on modern GPUs thanks to a clever redesign that avoids memory bottlenecks and atomic write contention.
Asynchronous RL for LLMs can be sped up 2.5x by explicitly controlling policy-gradient variance, without sacrificing synchronous performance.