Search papers, labs, and topics across Lattice.
This paper tackles the sequential service region design (SSRD) problem, optimizing investment timing and location under demand uncertainty while considering capacity constraints and spillover effects. They model the intertemporal trade-offs using real options analysis (ROA) and address the combinatorial complexity of sequencing regional portfolios with a Transformer-based Proximal Policy Optimization (TPPO) algorithm. Experiments on realistic settings show TPPO outperforms benchmark DRL methods, identifying investment sequences with higher option value and demonstrating robustness under varying conditions.
Forget exhaustive enumeration: a Transformer-based reinforcement learning approach can efficiently optimize sequential service region design under uncertainty, outperforming standard DRL methods.
Service region design determines the geographic coverage of service networks, shaping long-term operational performance. Capital and operational constraints preclude simultaneous large-scale deployment, requiring expansion to proceed sequentially. The resulting challenge is to determine when and where to invest under demand uncertainty, balancing intertemporal trade-offs between early and delayed investment and accounting for network effects whereby each deployment reshapes future demand through inter-regional connectivity. This study addresses a sequential service region design (SSRD) problem incorporating two practical yet underexplored factors: a $k$-region constraint that limits the number of regions investable per period and a stochastic spillover effect linking investment decisions to demand evolution. The resulting problem requires sequencing regional portfolios under uncertainty, leading to a combinatorial explosion in feasible investment sequences. To address this challenge, we propose a solution framework that integrates real options analysis (ROA) with a Transformer-based Proximal Policy Optimization (TPPO) algorithm. ROA evaluates the intertemporal option value of investment sequences, while TPPO learns sequential policies that directly generate high option-value sequences without exhaustive enumeration. Numerical experiments on realistic multi-region settings demonstrate that TPPO converges faster than benchmark DRL methods and consistently identifies sequences with superior option value. Case studies and sensitivity analyses further confirm robustness and provide insights on investment concurrency, regional prioritization, and the increasing benefits of adaptive expansion via our approach under stronger spillovers and dynamic market conditions.