Search papers, labs, and topics across Lattice.
2
0
5
Commodity GPU servers can achieve surprisingly high LLM inference throughput by cleverly orchestrating pipeline parallelism with KV cache offloading.
Ditch online training for neural combinatorial solvers: ECO leverages offline self-play with Mamba to achieve state-of-the-art performance with significantly improved memory utilization and training throughput.