Search papers, labs, and topics across Lattice.
This paper introduces a novel Mamba-Transformer hybrid architecture for extractive summarization to address the quadratic complexity bottleneck of transformers when processing long documents, particularly in low-resource settings. The model uses a transformer encoder for sentence-level semantics, a Mamba state space model to capture inter-sentence dependencies, and a linear classifier for sentence relevance. Experiments across news, argumentative, and scientific domains demonstrate significant ROUGE improvements over BERTSUM and MATCHSUM, along with faster inference speeds, especially on longer documents and in low-resource conditions.
Mamba's linear-time processing lets you summarize long documents without truncation, unlocking significant ROUGE gains (+0.23 on ArXiv) in low-resource settings compared to BERT-based methods.
Extractive summarization of long documents is bottlenecked by quadratic complexity, often forcing truncation and limiting deployment in resource-constrained settings. We introduce the first Mamba-Transformer hybrid for extractive summarization, combining the semantic strength of pre-trained transformers with the linear-time processing of state space models. Leveraging Mamba's ability to process full documents without truncation, our approach preserves context while maintaining strong summarization quality. The architecture includes: (1) a transformer encoder for sentence-level semantics, (2) a Mamba state space model to capture inter-sentence dependencies efficiently, and (3) a linear classifier for sentence relevance prediction. Across news, argumentative, and scientific domains under low-resource conditions, our method achieves: (1) large gains over BERTSUM and MATCHSUM, including +0.23 ROUGE-1 on ArXiv and statistically significant improvements on all datasets (p < 0.001); (2) consistent advantages across domains, strongest on the longest documents; (3) robust performance with limited training data; and (4) 24-27% faster inference on news summarization (CNN/DailyMail). We introduce the first hybrid Transformer-state space architecture for summarization, showing significant ROUGE improvements in low-resource scenarios.