NUSApr 28, 2026arXiv:2604.25259

DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control

AI Summary

DGLight fine-tunes LLMs for traffic signal control by using a DQN critic, trained on structured intersection states, to score candidate actions generated by the LLM. This approach leverages Group Relative Policy Optimization (GRPO) to optimize the LLM policy based on the critic's dense per-state supervision, rather than sparse environment rewards. Experiments on Jinan and Hangzhou datasets demonstrate that DGLight outperforms other LLM-based controllers, remains competitive with strong RL baselines, and exhibits good transfer learning capabilities.

Key Contribution

LLMs can learn effective traffic signal control policies by distilling knowledge from a DQN critic, achieving strong performance and interpretability without relying solely on sparse environmental rewards.

Abstract

Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.

Natural Language Processing RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control

Related Papers