Search papers, labs, and topics across Lattice.
The paper introduces Switch-Hurdle, a novel forecasting framework for intermittent demand that combines a Mixture-of-Experts (MoE) encoder with a Hurdle-based probabilistic decoder. The MoE encoder uses Top-1 routing with a straight-through estimator for efficient training, while the Hurdle decoder explicitly models the probability of a sale and the conditional quantity given a sale using cross-attention autoregression. Experiments on the M5 benchmark and a proprietary retail dataset demonstrate state-of-the-art prediction performance and scalability compared to existing methods.
Switch-Hurdle tackles intermittent demand forecasting by disentangling sales probability and quantity prediction, achieving state-of-the-art accuracy while scaling efficiently using a Mixture-of-Experts architecture.
Intermittent demand, a pattern characterized by long sequences of zero sales punctuated by sporadic, non-zero values, poses a persistent challenge in retail and supply chain forecasting. Both traditional methods, such as ARIMA, exponential smoothing, or Croston variants, as well as modern neural architectures such as DeepAR and Transformer-based models often underperform on such data, as they treat demand as a single continuous process or become computationally expensive when scaled across many sparse series. To address these limitations, we introduce Switch-Hurdle: a new framework that integrates a Mixture-of-Experts (MoE) encoder with a Hurdle-based probabilistic decoder. The encoder uses a sparse Top-1 expert routing during the forward pass yet approximately dense in the backward pass via a straight-through estimator (STE). The decoder follows a cross-attention autoregressive design with a shared hurdle head that explicitly separates the forecasting task into two components: a binary classification component estimating the probability of a sale, and a conditional regression component, predicting the quantity given a sale. This structured separation enables the model to capture both occurrence and magnitude processes inherent to intermittent demand. Empirical results on the M5 benchmark and a large proprietary retail dataset show that Switch-Hurdle achieves state-of-the-art prediction performance while maintaining scalability.