Search papers, labs, and topics across Lattice.
1
0
3
SSM inference can be significantly accelerated on multi-GPU systems with a communication-aware tensor parallelism strategy, achieving up to 4x throughput gains on Mamba models.