Stanford HAI∗Equal contributionApr 9, 2026arXiv:2604.07935

The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

Robin Geens, Jonas De Schouwer, Marian Verhelst, Thierry Tambe

AI Summary

This paper analyzes the architectural evolution of State-Space Models (SSMs) like Mamba, showing how optimizations for hyperscale GPU throughput in Mamba-3 have inadvertently sacrificed efficiency on edge devices. They find that Mamba-3's changes lead to a 28-48% latency increase on edge devices compared to earlier versions, despite theoretical linear complexity. The authors advocate for decoupling cloud-scale and edge-native design principles to maintain the benefits of SSMs for real-time edge applications.

Key Contribution

Mamba's quest for hyperscale GPU saturation has backfired, making it significantly slower on edge devices despite its theoretical efficiency.

Abstract

The Hardware Lottery posits that research directions are dictated by available silicon compute platforms. We identify a derivative phenomenon, the Hyperscale Lottery, where model architectures are optimized for cloud throughput at the expense of algorithmic efficiency. While State-Space Models (SSMs) such as Mamba were lauded for their linear complexity, ideal for edge intelligence, their evolution from Mamba-1 to Mamba-3 reveals a systematic divergence from edge-native efficiency. We demonstrate that Mamba-3's architectural changes, designed to saturate hyperscale GPUs, impose a significant edge penalty: a 28% latency increase at 880M parameters, worsening to 48% for 15M-parameter models. We argue for decoupling cloud-scale saturation strategies from core architectural design to preserve the viability of single-user, real-time edge intelligence.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References18

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

Related Papers