Search papers, labs, and topics across Lattice.
This paper tackles the challenge of dynamically allocating tasks to heterogeneous autonomous aerial vehicles (AAVs) in urban logistics by formulating it as an overlapping coalition formation game. To handle stochastic task arrivals, they introduce a transformer-based soft actor-critic network that learns to guide coalition updates, replacing traditional heuristics. The resulting algorithm, proven to converge to a Nash-stable equilibrium, achieves a 39.76% cost reduction compared to heuristic baselines in simulations and is validated through indoor flight experiments.
Ditch the heuristics: a transformer-guided reinforcement learning approach slashes logistics costs by nearly 40% when allocating tasks to drone fleets.
In dynamic urban logistics, the stochastic emergence of time-sensitive tasks poses a significant optimality challenge for heterogeneous AAVs logistics task allocation. To address this problem, a reinforcement learning enhanced overlapping coalition formation game approach is proposed. A dynamic task allocation model is established, where global optimality is mathematically quantified by a generalized logistics cost coupling service quality and resource consumption. To deal with the time-varying task sets induced by stochastic order arrivals, a transformer-based soft actor-critic network is designed. By leveraging multi-head self-attention to encode variable-length logistics states and capture task-wise spatiotemporal dependencies, the learned policy adaptively guides coalition updates, replacing heuristic rules in the overlapping coalition formation game. On this basis, heterogeneous AAVs can form more efficient overlapping coalitions for dynamic logistics tasks. The resulting coalition formation process is proven to constitute an exact potential game, which guarantees convergence to a Nash-stable equilibrium within a finite number of iterations. Numerical simulations demonstrate that the proposed algorithm effectively improves the optimality of task allocation under the generalized logistics cost criterion. In a scenario with 32 AAVs and 80 tasks, our algorithm achieves a 39.76% cost reduction compared with the heuristic OCF baseline. Indoor flight experiments further validate its practicality.