Search papers, labs, and topics across Lattice.
DGLight fine-tunes LLMs for traffic signal control by using a DQN critic, trained on structured intersection states, to score candidate actions generated by the LLM. This approach leverages Group Relative Policy Optimization (GRPO) to optimize the LLM policy based on the critic's dense per-state supervision, rather than sparse environment rewards. Experiments on Jinan and Hangzhou datasets demonstrate that DGLight outperforms other LLM-based controllers, remains competitive with strong RL baselines, and exhibits good transfer learning capabilities.
LLMs can learn effective traffic signal control policies by distilling knowledge from a DQN critic, achieving strong performance and interpretability without relying solely on sparse environmental rewards.
Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.