Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2013 (2013), Article ID 962869, 10 pages
Research Article

The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment

1School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510640, China
2School of Electronic Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
3School of Port and Shipping Management, Guangzhou Marine Institute, Guangzhou 510725, China

Received 25 February 2013; Revised 12 August 2013; Accepted 26 August 2013

Academic Editor: Orwa Jaber Housheya

Copyright © 2013 Lun-Hui Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA) for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agent Q-learning. This method lets each TSCA learn to update its Q-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.