Development of a Neural Network Based Q Learning Algorithm for Traffic Signal Control

Article Preview

Abstract:

As one kind of reinforcement learning method, Q learning algorithm has already been proved to achieve many significant results in traffic signal control area. However, when the state of Markov Decision Process is very big or continuous, the computation load and the memory load will become very big and can not be solved then. Therefore, this paper proposed a neural network based Q learning algorithm to solve this problem known as “Curse of Dimensionality”. This new method realized generalization of conventional Q learnig algorithm in huge and continuous state space as neural network is a very effective value function approximator. Experiment has been implemented upon an isolated intersection and simulation results show that the proposed method can improve the traffic efficiency significantly than the conventional Q learning algorithm.

You have full access to the following eBook

Info:

Periodical:

Pages:

91-95

Citation:

Online since:

December 2011

Export:

Share:

Citation:

[1] R. Sutton. Learning to predict by the methods of temporal difference. Machine Learning, 1988, 3: pp.9-44.

Google Scholar

[2] Lu Shoufeng, Liu Ximin and Dai Shiqiang. Q learning for adaptive traffic signal control based on delay minimization strategy. IEEE International Conference on Networking Sensing and Control, 2008, pp.687-691.

DOI: 10.1109/icnsc.2008.4525304

Google Scholar

[3] Wei Wu, Gong Shufeng and Liu Hongxiu. A coordinated urban traffic signal control approach based on multi-agent. INES'09 Proceedings of the IEEE 13th international conference on Intelligent Engineering Systems , (2009).

DOI: 10.1109/ines.2009.4924773

Google Scholar

[4] Marco Wiring. Multi-agent reinforcement learning for traffic light control. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning, (2000).

Google Scholar

[5] Mark Humphrys. Action selection methods using reinforcement learning. A dissertation submitted for the degree of Doctor of Philosophy in the University of Cambridge, (1997).

Google Scholar