Feb 19, 2026arXiv:2602.17363

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Gabriel Mongaras, Gabriel Mongaras, Eric C. Larson, Eric C. Larson

AI Summary

The authors simplify and modify Mamba-2, a linear attention variant, to improve its accuracy and bridge the gap with softmax attention. Through ablation studies, they identify key components of Mamba-2 that contribute to its accuracy and develop a simplified version, Mamba-2S. By enhancing the A-mask and increasing the order of the hidden state in Mamba-2S, they create 2Mamba, achieving near softmax attention accuracy with better memory efficiency for long contexts.

Key Contribution

2Mamba closes the accuracy gap between linear and softmax attention, offering a memory-efficient alternative without sacrificing performance.

Abstract

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Related Papers