Search papers, labs, and topics across Lattice.
1
0
3
By interleaving Transformers and Mamba, MaBERT achieves a 2x speedup in training and inference for long-context masked language modeling, without sacrificing GLUE benchmark performance.