Search papers, labs, and topics across Lattice.
This paper introduces a multi-agent reinforcement learning (MARL) framework for traffic signal control that improves generalization and stability in dynamic traffic scenarios. The framework incorporates turning ratio randomization during training, a stability-oriented exponential phase duration adjustment action space, and a neighbor-based observation scheme using MAPPO with centralized training and decentralized execution (CTDE). Experiments in the Vissim traffic simulator demonstrate that the proposed framework reduces average waiting time by over 10% compared to standard RL baselines and exhibits better generalization to unseen traffic patterns.
Forget brittle, overfit traffic signal controllers: this MARL framework uses turning ratio randomization and exponential phase adjustments to achieve >10% reduction in average wait times while generalizing to unseen traffic patterns.
Reinforcement Learning (RL) in Traffic Signal Control (TSC) faces significant hurdles in real-world deployment due to limited generalization to dynamic traffic flow variations. Existing approaches often overfit static patterns and use action spaces incompatible with driver expectations. This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework validated in the Vissim traffic simulator. The framework integrates three mechanisms: (1) Turning Ratio Randomization, a training strategy that exposes agents to dynamic turning probabilities to enhance robustness against unseen scenarios; (2) a stability-oriented Exponential Phase Duration Adjustment action space, which balances responsiveness and precision through cyclical, exponential phase adjustments; and (3) a Neighbor-Based Observation scheme utilizing the MAPPO algorithm with Centralized Training with Decentralized Execution (CTDE). By leveraging centralized updates, this approach approximates the efficacy of global observations while maintaining scalable local communication. Experimental results demonstrate that our framework outperforms standard RL baselines, reducing average waiting time by over 10%. The proposed model exhibits superior generalization in unseen traffic scenarios and maintains high control stability, offering a practical solution for adaptive signal control.